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VERTEBRATE HOMOLOGUES OF UNC-53 



PROTEIN OP C. ELEGANS 



The present invention relates to vertebrate 
homologues of UNC-53 protein of C. elegant and cDNA 
sequences coding for said homologues or functional 
equivalents thereof. The invention also relates to 
processes for identifying compounds which control cell 
behaviour, compounds identified and pharmaceutical 
compositions containing them in addition to processes 
and assays for identifying disease states in which 
said gene or protein is dysfunctional. 

The control of cell motility, cell shape and 
directionality of cell outgrowth of axones or other 
cell outgrowths is an essential feature in the 
morphogenesis and function of both unicellular and 
multicellular organisms. The control of these 
processes is disturbed in a variety of disease states 
in which, for example, the Receptor Tyrosine kinase 
(RTK) signal transduction pathways, or the like, or 
their downstream intra-cellular pathways (which are 
shared with other extra-cellular receptors, including 
cell adhesion molecules like N-CAMS and integrins) are 
overstimulated . 

Some cell surface proteins and extra-cellular 
molecules controlling the directionality and potential 
of cell migration have been identified, although the 
processes involved are not generally understood. 

It is generally considered that a long-range 
migration of a cell process (also known as a growth 
cone extension) is a stepwise event, whereby prior to 
and after each extension there is the formation of a 
structure at the leading edge of the cell which senses 
signals in the environment instructing the cell to 
either stabilise a cell process extending in a 
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preferred direction, or to cause a lamellipodium to 
extend a process in a given direction. Localised 
stabilisation of the actin cytoskeleton and 
association with plus end regions of microtubules is a 
5 general cell biological process underlying the choice 
of directional extension. Microtubule binding 
directing these processes has not previously been 
identified. The present inventors have surprisingly 
found that UNC-53 protein of C. elegans and vertebrate 
10 homologues thereof is involved in binding of 

microtubules and particularly of plus end regions of 



A gene from the free-living nematode 
Caenorhabditis eleaans designated "unc-53" has been 

15 previously identified and cloned (Abstract, 

International c. eleaans Meeting, June 1-5 1991, 
Madison, Wisconsin, 58, Bogaert and Goh) . The present 
inventors previously identified UNC-53 protein as a 
signal transducer or signal integrator controlling the 

20 directionality of cell migration and/or cell shape in 
C. elegans (WO 96/38555) . Increased UNC-53 protein 
activity was found to be proportional to cell process 
extensions in the correct direction of cell migration. 
The unc-53 gene was found to encode a signal 

25 transduction molecule that transduces a signal from an 
RTK such as, for example, via the adaptor protein SEM- 
5/GRB-2, to the machinery controlling directional 
growth cone extension or stabilisation, in a highly 
dosage - dependent fashion. 

30 Genetic and experimental analysis of C . eleaans 

UNC-53 mutants showed that mutations in the unc-53 
gene do not affect the general ability of cells to 
migrate but rather affect the ability of cells to 
migrate under specific antero-posterior cues. 

35 Reduction of UNC-53 activity leads to loss of 
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direction and reduction of growth cone extension as 
indicated by the directionality of random extension 
cycles observed in excretory canal growth cones in 
UNC-53 mutants. 

5 The function of UNC-53 is highly sensitive to its 

dosage or activity. Reduction of function leads to 
proportional reduction of migration to the specific 
signal while increased expression, using transgenic 
expression of UNC-53 in muscle cells, leads to 
10 increased directional migration. The data lead to the 
conclusion that UNC-53 functions as an integrator of a 
directional signal in the organism whereby reception 
of signals leads to growth cone extension in the 
correct direction. 
15 Certain alleles of UNC-53 enhance the sex 

myoblast migration defect of SEM-5 c. .l^nc mutants 
in a receptor tyrosine kinase signal transduction 
pathway (Stern et al 1993 mol. Biol, cell, 4, 1175- 
1188) . While the genetics suggests that UNC-53 and 
20 SEM-5 cooperate to regulate sex myoblast migration, 
genetic experiments do not permit a conclusion that 
this is the result of a direct molecular interaction. 
The inventors previously identified a potential sem- 
5/GRB-2 binding site and showed in two types of 
25 biochemical experiments that UNC-53 physically 

interacts with SEM-5. The present inventors conclude 
that UNC-53 encodes a signal transduction molecule 
that transduces extracellular signals for directional 
migration via the adapter protein SEM-5/GRB-2 to the 
machinery controlling directional growth cone 
extension or stabilization. 

Several lines of evidence indicate that UNC-53 
might act as an adapter linking extracellular signals 
to the act in cytoskeleton. Firstly, UNC-53 has shown 
homology to cortical actin binding proteins and that 



30 



35 
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it is capable of binding F-actin in vitro. In 
addition expression of UNC-53 in mammalian cells leads 
to changes in the F-actin cytoskeleton. Very low 
levels of UNC-53 expression increase the number of 
5 filopodia and actin microspikes protruding from the 
cell surface. Cells expressing UNC-53 also exhibit 
increased neurite extension and increased cell 
motility. UNC-53 thus also acts as an activator of 
migration. 

10 considering all available data the following 

possible mechanisms of action of UNC-53 can be 

JU WA. ******* • 

The choice and activation of directional growth 
cone extension can be accounted for by local 
15 activation of UNC-53 via a SEM-5/GRB2 complex to a 
receptor (eg receptor tyrosine kinase signal) which 
reads a localized or directional signal. Changes in 
growth cone steering are preceded by the formation of 
a localized actin patch in the area of the growth cone 
20 receiving the highest signal (Bentley and O'Connor et 
al. Curr. Op. NeuroBiol. 1994, vol 4, 43-48). 
UNC-53 might be directly involved in forming these 
actin patches through its own actin binding or cross- 
linking properties. Alternatively activated UNC-53 
25 may (eg via its nucleotide binding domain) transduce a 
signal to as yet unidentified effectors. For example, 
activation of the small GTP-binding protein cdc42 or a 
related protein leads to formation of small actin 
patches as well as the formation of small filopodia. 
30 The unc-53 pathway may be upstream of cdc42 or both 
signal transducers might share downstream pathways. 

The present inventors thus decided to investigate 
if a similar protein was present in higher organisms 
such as vertebrates. 
35 T he present inventors describe the identification 
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of a family of genes in vertebrates, and particularly 
in man and mouse with extensive structural homology to 
UNC-53. The present inventors have surprisingly found 
that the nucleotide domains of UNC-53 from C. elegans 
5 and UNC-53 from vertebrates similarly activate 
motility, establishing functional equivalence. 
Furthermore these domains are shown to be capable of 
transforming NIH3T3 cells in vitro. The inventors 
also found changes in RNA transcripts in transformed 

10 cell lines compared to normal human tissues suggesting 
a role for UNC-53 in cell differentiation, 
morphogenesis and disease. Furthermore, in vitro 
assays and transgenic models are also described that 
identify pharmacological modulators of UNC-53 activity 

15 and assays to identify proteins interacting with UNC- 
53. 

According to a first aspect of the present 
invention, there is provided a vertebrate protein 
homologue of UNC-53 protein of C. elegans or a 

20 functional equivalent, derivative or bioprecursor 

thereof, which protein homologue comprises an amino 
acid sequence having a statistically significant 
homology to the UNC-53 protein of C. elegans as 
illustrated in figure 2. According to the present 

25 invention a derivative should be taken to mean 

mutational derivatives, fusions, internal deletions, 
splice variants and muteins. 

There is also provided according to a second 
aspect of the present invention a vertebrate protein 

30 homologue of UNC-53 protein of C. elegans f which 

protein comprises an amino acid sequence having one or 
more of sequence homology blocks A, B, C, D or E as 
illustrated in Figure 9a, or block F in Figure 12a or 
a sequence having a statistically significant homology 

3 5 therewith. 
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Preferably, said vertebrate homologue is a human 
protein or a mouse protein. 

According to a further aspect of the invention 
there is provided a vertebrate protein homologue of an 
UNC-53 protein of C. elegans f which protein comprises 
an amino acid sequence having one or more of sequence 
blocks A, B, C, D,E or F which differ from those 
blocks of Figure 9a and Figure 12a to a significant 
extent only in conservative amino acid changes* In an 
even further aspect of the invention there is provided 
a vertebrate protein having an amino acid sequence 
encoded by the nucleotide sequence from position 1 to 
position 6013 as illustrated in Figure 9b. There is 
also provided a vertebrate protein having an amino 
acid sequence encoded by the nucleotide sequence 
illustrated in Figure lid, or a functional equivalent 
derivative, or bioprecursor of said homologue. 

According to a further aspect of the present 
invention there is provided a vertebrate protein 
having an amino acid sequence corresponding to the 
prosite signatures as illustrated in Figure 28 for 
each of said homology blocks as defined above. 
Advantageously the prosite signatures can be used to 
identify a protein having a statistically significant 
homology to the UNC-53 protein of C. eleaans . (Luethy 
et al 1994, Protein Science, 3, 139-146). 

A further aspect of the invention comprises a 
vertebrate homologue according to the invention 
comprising an amino acid sequence as shown in figure 
9b or lid or an amino acid sequence which differs from 
the amino acid sequences shown in these figures to a 
significant extent only in one or more conservative 
amino acid changes. 

In a further aspect of the present invention 
there is also provided a nucleic acid molecule, which 
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is preferably DNA, and which encodes a vertebrate 
homologue of UNC-53 protein of C. elegans f or a 
functional equivalent derivative, fragment or 
bioprecursor of said homologue according to the 
5 invention. Preferably, the cDNA comprises a sequence 
of nucleotides encoding an amino acid sequence as 
illustrated in figures 9b or lid or an amino acid 
which differs from the sequences shown in these 
figures to a significant only in one or more 

10 conservative amino acid changes. Preferably the DNA is 
cDNA, which cDNA comprises at least from position 1 
to 6013 of the sequence shown in Figure 9b. 
Alternatively the cDNA may comprise the sequence 
illustrated in Figure lid. Also provided by the 

15 present invention is a nucleic acid sequence capable 
of hybridising to the nucleic acid or DNA sequences 
according to the invention under high strigency 
conditions, which conditions are well known to those 
skilled in the art. 

2 0 The cDNA according to the invention may be 

included in an expression vector which may itself be 
used to transform or transfect a host cell, which cell 
may be bacterial or eukaryotic in origin including 
such as, for example an animal or plant cell a fungal 

25 cell or an insect cell. Thus, advantageously, once 

the cDNA corresponding to the genome of the vertebrate 
homologue of UNC-53 of c. eleaans is synthesised, 
using for example, reverse transcriptase or the like, 
a range of cells, tissues or organisms may be 

30 transfected following incorporation of the selected 
cDNA clone into an appropriate expression vector. The 
expression vector according to the invention may 
comprise a promoter of C. elegans or one of human 
mouse or viral origin and optionally a sequence 

35 encoding a reporter molecule, such as, for example, 
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green fluorescent protein. 

The present invention, therefore, also further 
comprises a transgenic cell, tissue or organism 
comprising a transgene capable of expressing a 
5 vertebrate homologue of UNC-53 protein of C. eleaans 
or a functional equivalent, fragment derivative or 
bioprecursor of said homologue. The term "transgene 
capable of expressing a vertebrate homologue of UNC-53 
protein of C. elegans" as used herein means a suitable 
10 nucleic acid sequence which leads to the expression of 
a vertebrate homologue of UNC-53 protein of C. elegans 

V^aif 1 nr» +- V>r> e> avnn f unnf -i *-\*-» *-» «-? / *~\^~ 4- -» -n r A 4- * » m W 

transgene may include, for example, genomic nucleic 
acid isolated from the appropriate vertebrate or 

15 synthetic nucleic acid including cDNA. The term 

"transgenic organism, tissue or cell, as used herein 
means any suitable organism and/or part of an 
organism, tissue or cell, that contains exogenous 
nucleic acid either stably integrated in the genome or 

2 0 in an extrachromosomal state. 

Preferably the transgenic cell comprises any of, 
a COS cell, HepG2 cell, MCF-7 or N4 neuroblastoma cell 
or a NIH3T3 cell or a colorectal or carcinoma cell or 
a human derived cell such as a fibroblast or the like. 

2 5 The transgenic organism may be an insect, a non-human 

animal or a plant and preferably C. elegans or a 
related nematode. Preferably, the transgene comprises 
the nucleic acid sequence encoding the vertebrate 
homologue or a functional fragment of said gene 

3 0 according to the invention as described above. The 

transgene preferably comprises an expression vector 
according to the invention. 

The term "functional fragment" as used herein 
should be taken to mean a fragment of the gene coding 
35 for the vertebrate homologue of the UNC-53 protein of 
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Ct elegflns or a functional equivalent or derivative or 
bioprecursor of said protein. For example, the gene 
may comprise deletions or mutations but may still 
encode a functional vertebrate homologue of UNC-53 
protein. 

Further provided by the present invention is a 
method of producing a mutant vertebrate non-human 
organism or cell having a mutation in the wild-type 
gene coding for the vertebrate homologue of UNC-53 
protein, which mutation affects cell behaviour or the 
regulation of cell motility or the shape or the 
direction of cell migration or microtubule plus end 
stability or function and localisation of protein 
complexes located thereon, which method comprises 
inducing a mutation in the vertebrate homologue of 
UNC-53 protein in said organism or cell. These mutant 
organisms or cells may be used in a screen to identify 
the effects of compounds on these cell functions. 

The vertebrate homologue of UNC-53 protein of 
Ct elegans or the cDNA or genomic DNA encoding it or a 
functional equivalent, derivative, fragment or 
bioprecursor of said homologue, may advantageously be 
used as a medicament, or in the preparation of a 
medicament to promote neuronal regeneration, 
revascularisation or wound healing or the treatment of 
chronic neurodegenerative disorders or acute traumatic 
injuries or fibrotic disease or physiological events 
requiring the polarity of cells or epithelia. The 
present inventors have also found that the vertebrate 
homologue of UNC-53 protein plays a role in a 
transformed state of cells. Accordingly, the 
vertebrate homologue, dominant positive or negative 
mutants thereof, or inhibitors thereof may 
advantageously be used to induce or alleviate contact 
inhibition in a cell or in preventing cancer 
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development. Typically, the above medical conditions 
may be treated in mammals and more preferably humans 
by either a homologue of UNC-53 protein or 
alternatively by a nucleic acid coding for such a 
5 protein. Alternatively an antisense oligonucleotide 
to said UNC-53 homologue may be used to prevent i^s 
expression- Examples of other nucleic acid sequences 
which may be used include 3' untranslated regions of 
mRNA which could be used to prevent transcription of 

10 the genomic sequence encoding for the vertebrate 
homologue of UNC-53 protein. 

The vertebrate homologue of UNC-53 protein or a 
functional equivalent, fragment or bioprecursor of 
said protein may be incorporated into a 

15 pharmaceutical^ acceptable composition together with 
a suitable carrier, diluent or excipient therefor. 
The pharmaceutical composition may advantageously 
comprise, additionally or alternatively, the nucleic 
acid sequence according to the invention as defined 

2 0 above . 

The present invention also provides for a method 
of determining whether a compound is an inhibitor or 
enhancer of the regulation of cell behaviour, growth, 
transformation, cell shape or motility or the 

25 direction of cell migration or microtubule plus end 
stability or function and localisation of protein 
complexes thereon which method comprises contacting 
said compound with a transgenic cell according to the 
invention and screening for a phenotypic change in 

30 said cell. Preferably the method can determine 

whether the compound comprises an inhibitor or an 
enhancer of the signal transduction pathway of said 
transgenic cell of which pathway said vertebrate 
homologue of UNC-53 protein, or a functional 

35 equivalent, derivative, fragment or bioprecursor of 
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said vertebrate homologue is a component or whether 
said compound is an inhibitor or an enhancer of a 
parallel or redundant signal transduction pathway in 
said cell. The present invention also provides a 
5 method to determine that the protein in said signal 

transduction pathway is a vertebrate homologue of UNC- 
53 protein of C. elegans or a functional equivalent, 
fragment, derivative or bioprecursor of said 
vertebrate homologue. 

10 Preferably, the phenotypic change to be screened 

comprises a change in cell shape or a change in cell 
motility. Where a transgenic cell is used in 
accordance with one embodiment of the method of the 
invention, an N4 neuroblastoma cell may be used and in 

15 such an embodiment the phenotypic change to be 
screened may be the length of neurite growth or 
changes in filipodia outgrowth or alternatively 
changes in ruffling behaviour or cell adhesion or any 
change in microtubule cytoskeleton or any change in 

20 localisation of proteins on plus end regions of 

microtubules or any change in cell death such as in 
apoptosis. In an alternative embodiment of the method 
of the invention, the transgenic cell may comprise an 
MCF-7 breast cancer cell. Typically in such an 

25 embodiment the phenotypic change to be screened 

comprises the extent of phagokinesis or filipodia 
formation. In an alternative embodiment of this 
aspect of the invention, the transgenic cell may 
comprise an NIH3T3 cell. Typically in such an 

30 embodiment the phenotypic change to be screened 
comprises loss of contact inhibition of foci 
formation. The method according to the invention, may 
also utilise a mutant cell or mutant organism 
according to the invention as described above, where 

35 the mutant cell is capable of growing in tissue 
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culture or in vivo and either of which cell or 
organism has a mutation in the wild-type unc-53 gene* 

In accordance with the present invention, a 
"phenotypic change", may be any phenotype resulting 
5 from changes at any suitable point in the life cycle 
of the cell, tissue or organism defined above, which 
change can be attributed to the expression of the 
transgene such as for example , growth, viability, 
morphology, behaviour, movement, cell migration or 

10 cell process or growth cone extension of cells and 
includes changes in body, shape, locomotion, 
cheiuotaxis , contact inhibition, mating behaviour or 
the like. The phenotypic change may preferably be 
monitored directly by visual inspection of the cell as 

15 a whole or particularly by monitoring the F-actin 
cytoskeleton microtubule network and plus end 
stability of microtubules or proteins thereon or 
alternatively by for example measuring indicators of 
viability including endogenous or transgenically 

2 0 introduced histochemical markers or other reporter 

genes, such as for example 6-galactosidase or green 
fluorescent protein* 

A compound which is identifiable by the method 
according to the invention as described above, as an 

25 enhancer of the processes identified above such as the 
regulation of cell shape or motility or the direction 
of cell migration may be used as a medicament, or 
alternatively in the preparation of a medicament, for 
promoting neuronal regeneration, revascularisation or 

30 wound healing, or for treatment of chronic neuro- 
degenerative diseases or acute traumatic injuries or 
fibrotic disease. Examples of promoting neuronal 
regeneration include, for example, peripheral nerve 
regeneration after trauma and spinal cord trauma. 

3 5 Where a compound is identified in accordance with 
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the method described above as being an inhibitor of 
the regulation of cell shape etc., the compound may be 
used as a medicament, or in the preparation of a 
medicament, for substantially alleviating spread of 
5 disease inducing cells, such as in spread of cancer, 
or the like in metastasis or in alleviating loss of 
contact inhibition. Advantageously, any of the 
compounds which may have been identified as an 
inhibitor or an enhancer in accordance with the method 

10 as described above, may also be included in a 

pharmaceutical composition comprising the respective 
compound and a pharmaceutical ly acceptable carrier, 
diluent or excipient therefor. 

The particular mechanism of action of a compound 

15 identified as either an inhibitor or an enhancer of 
the cell motility shape, growth or direction of cell 
migration or microtubule association or to the plus 
end region thereof is not limiting. Preferably the 
compound acts as an inhibitor or enhancer of a signal 

20 transduction pathway. The compound may also act on a 
parallel pathway or directly on the vertebrate 
homologue of UNC-53 protein of C. eleaans. For 
example, the method of action of the compound may 
include direct interaction with the vertebrate 

25 homologue of UNC-53 protein, interaction with 
processes for regulating phosphorylation or 
dephosphorylation of the vertebrate homologue of UNC- 
53 or with processes regulating activity of an unc-53 
gene or with processes for post-transcriptional or 

30 post-translational modification or the like. 

Preferably the compound is identified by the 
method according to the invention as an inhibitor or 
an enhancer, by utilising differences of phenotype of 
the cell, tissue or organism, which are visible to the 

35 eye. Alternatively indicators of viability including 
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endogenous or transgenically introduced histochemical 
markers or a reporter gene may be used. 

According to a further aspect of the invention 
there is also provided a transgenic cell or tissue 
5 culture which has been constructed to comprise a 

promoter sequence of a gene coding for a vertebrate 
homologue of UNC-53 of c. elegans or a functional 
equivalent, derivative fragment, or bioprecursor of 
said homologue operably linked to a nucleic acid 

10 sequence encoding a reporter molecule. Preferably, 
the reporter sequence encoding the reporter molecule 
which comprises a detectable protein, for example one 
which may be monitored by eye inspection such as 
antibiotic resistance, 6-galactosidase or a molecule 

15 detectable by spectrophotometry, spectrof luorometric, 
luminescent or radioactive assays. 

The present invention also provides a method of 
determining whether a compound is an inhibitor or an 
enhancer of transcription of a gene coding for a 

20 vertebrate homologue of UNC-53 protein in C. eleaans f 
or a functional equivalent, derivative fragment or 
bioprecursor of said homologue, which method comprises 
the steps of : 

(a) contacting said compound with a transgenic 
25 cell according to the invention as described 

above, 

(b) monitoring the level of said reporter 
molecule and comparing results obtained from this 
monitoring step with a control comprising a 

30 transgenic cell having the promoter sequence of a 

gene coding for a vertebrate homologue of UNC-53 
protein, or a functional fragment of said 
homologue and the reporter molecule, in the 
absence of the compound. 

35 in one embodiment of the method according to this 
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aspect of the invention the reporter molecule may 
comprise messenger RNA. 

A compound identified as an enhancer of 
transcription of the gene coding for the vertebrate 
5 homologue of UNC-53 protein of C, elegans or a 

functional equivalent , derivative or bioprecursor of 
said homologue may also be used as a medicament, or in 
the preparation of a medicament, for promoting 
neuronal regeneration, revascularisation or wound 

10 healing, or for treatment of chronic neuro- 
degenerative diseases or acute traumatic injuries or 
fibrotic disease. Furthermore, such compounds may be 
included in a pharmaceutical composition including a 
pharmaceutical^ acceptable carrier, diluent or 

15 excipient therefor. Any compounds identified as 

inhibitors of transcription may, advantageously, be 
used in alleviating the spread of disease inducing 
cells such as cancers or metastasis or loss of contact 
inhibition. 

20 The present invention also provides a kit for 

determining whether a compound is an enhancer or an 
inhibitor of the regulation of cell growth, 
transformation, cell motility or shape or the 
direction of cell migration which kit comprises at 

25 least one transgenic or mutant cell or transgenic or 
mutant non-human organism according to the invention 
as described above and a plurality of wild-type cells 
or one organism of the same type, or a cell line or 
tissue culture and means for contacting said compound 

30 with said cell or organism. 

Also provided by the present invention is a kit 
for determining whether a compound is an inhibitor or 
an enhancer of transcription of a gene coding for a 
vertebrate homologue of UNC-53 protein of C, elegans 

35 or a functional equivalent , derivative or fragment 
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thereof, which kit comprises at least one transgenic 
cell or cells according to the invention and means for 
contacting said compounds with said cells. 

For the purposes of the present invention, the 
5 term "gene coding for a vertebrate homologue of UNC-53 
or a functional fragment of said homologue" includes 
the nucleic acid sequence shown in Figures 9b or lid 
or a fragment thereof, including the differentially 
spliced isoforms and transcriptional starts of the 

10 nucleic acid sequence and which sequence encodes a 

vertebrate homologue of UNC-53 protein or a functional 
equivalent , derivative, fragment or bioprecursor of 
the protein. 

The present invention also provides methods of 

15 identifying genes of vertebrates or fragments of said 
genes, which encode proteins which are active in the 
signal transduction pathway of which the vertebrate 
homologue of UNC-53 is a component. A preferred 
method comprises hybridizing to an appropriate cDNA 

20 library a nucleotide sequence, as defined herein, or a 
fragment thereof under appropriate conditions of 
stringency in order to identify genes having 
statistically significant homology with the cDNA 
clones of any one of the cDNA sequences according to 

25 the invention described above. 

Furthermore, there is also provided by the 
present invention a method of identifying a protein 
which is active in the signal transduction pathway of 
a cell of which a vertebrate homologue of UNC-53 

30 protein of C. eleaans or a functional equivalent, 

fragment or bioprecursor of said vertebrate homologue 
is a component. According to this aspect of the 
invention, the method comprises; 

(a) contacting an extract of said cell with an 

35 antibody to the vertebrate homologue of UNC-53 
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protein or a functional equivalent, fragment or 
bioprecursor of said protein, 

(b) identifying the antibody/ vertebrate 
homologue of UNC-53 complex, and 

(c) analysing the complex to identify any 
protein bound to the vertebrate homologue of 
UNC-5 3 protein other than the antibody. 

The vertebrate homologue of UNC-53 protein, 
therefore may bind regions of other proteins involved 
in the signal transduction pathway. It is also 
possible to sequentially identify a whole range of 
proteins involved in the signal transduction pathway. 

Antibodies to the vertebrate homologue of UNC-53 
protein may be produced according to known techniques 
as would be known to those skilled in the art. For 
example, polyclonal antibodies may be prepared by 
inoculating a host animal, such as a mouse, with a 
protein or epitope of a protein according to the 
invention and recovering immune serum. 

This aspect of the invention further comprises a 
method of identifying a further protein or proteins 
which are active in the signal transduction pathway of 
a cell of which UNC-53 is a component which method 
comprises: 

(a) forming an antibody to the first identified 
protein bound to the vertebrate homologue of 
UNC-53 protein in the method as described above, 

(b) contacting a cell extract with the antibody, 

(c) identifying the antibody/protein complex, 

(d) analysing the complex to identify any 
further protein bound to the first protein other 
than the antibody, and 

(e) optionally repeating steps (a) to (d) to 
identify further proteins in the pathway. 
According to this aspect of the present 
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invention, the antibody starts the process by binding 
to the vertebrate homologue of UNC-53 protein or a 
functional equivalent thereof in the signal 
transduction pathway. Any other proteins found 
complexed to the bound antibody or 

UNC-53 protein can then be used to identify further 
interacting proteins involved in the pathway. 

It may also be possible to identify proteins 
involved in the signal transduction pathway of a cell 
of which the vertebrate homologue of UNC-53 or a 
functional equivalent derivative or bioprecursor 
thereof is a component by using a vertebrate homologue 
of UNC-53 protein of C. eleaans . According to this 
aspect of the invention the method comprises: 

(a) contacting an extract of the cell with the 
vertebrate homologue of UNC-53 protein of 
C. elegans or a functional equivalent, 
fragment or bioprecursor of said homologue, 

(b) identifying the vertebrate homologue of 
UNC-53 protein/protein complex formed and 

(c) analysing the complex to identify any 
protein bound to the vertebrate homologue of 
UNC-53 protein other than the same 
vertebrate homologue of UNC-53 protein 

This method can also advantageously be used to 
identify further proteins in a signal transduction 
pathway of a cell by contacting an extract of the cell 
used as described above, with any protein identified 
from step (c) above not being a vertebrate homologue 
of UNC-53 protein and repeating steps (b) and (c) . 

Other methods which may be used for identifying 
proteins in a signal transduction pathway of a cell 
may comprise for example a western blot overlay method 
which method is well known to those skilled in the 
art. Cell extracts are run on gels to separate out 
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protein and subsequently blotted onto a nylon 
membrane. These membranes may then be incubated, for 
example in a medium containing a vertebrate homologue 
of UNC-53 having a label attached thereto such as a 
5 biotin or radiolabel and any protein conjugates 

visualised with for example a streptavidin or alkaline 
phosphatase conjugated antibody. 

The present invention also advantageously 
provides a process for the preparation of binding 
10 antibodies which recognise proteins or fragments 

thereof involved in the rate and direction of cell 
migration or the control of cell growth or shape, for 
the above methods. 

The monoclonal antibody for binding to the 
15 appropriate vertebrate homologue of UNC-53 (or its 
functional equivalent) may be prepared by known 
techniques as described by Kohler R. and Milstein C. , 
(1975) Nature 256, 495 to 497. 

Another method which may be used to identify 
20 proteins involved in the signal transduction pathway 

of a cell of which a vertebrate homologue of an UNC-53 
protein of C, elegans or a functional equivalent or 
derivative or bioprescursor is a component involves 
investigating protein-protein interactions using the 
25 two-hybrid vector method. This method is well known 
to those skilled in the art and which was first 
developed in yeast by Chien et al (1991) . This 
technique is based on functional reconstitution in 
vivo of a transcription factor which activates a 
30 reporter gene. More particularly the technique 

comprises providing an appropriate host cell with a 
DNA construct comprising a reporter gene under the 
control of a promoter regulated by a transcription 
factor having a DNA binding domain and an activating 
35 domain, expressing in the host cell a first hybrid DNA 
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sequence encoding a first fusion of a fragment or all 
of a nucleic acid sequence according to the invention 
and either said DNA binding domain or said activating 
domain of the transcription factor, expressing in the 
host at least one second hybrid DNA sequence, such as 
a library or the like, encoding putative binding 
proteins to be investigated together with the DNA 
binding or activating domain of the transcription 
factor which is not incorporated in the first fusion; 
detecting any binding of the proteins to be 
investigated with a protein according to the invention 
by detecting for the presence of any reporter gene 
product in the host cell; optionally isolating second 
hybrid DNA sequences encoding the binding protein. 

An example of such a technique utilises the GAL4 
protein in yeast. GAL4 is a transcriptional activator 
of galactose metabolism in yeast and has a separate 
domain for binding to activators upstream of the 
galactose metabolising genes as well as a protein 
binding domain. Nucleotide vectors may be 
constructed, one of which comprises the nucleotide 
residues encoding the DNA binding domain of GAL4 . 
These binding domain residues may be fused to a known 
protein encoding sequence, such as for example a 
sequence coding for the vertebrate homologue of 
UNC-53. The other vector comprises the residues 
encoding the protein binding domain of GAL4 . These 
residues are fused to residues encoding a test 
protein, preferably from the signal transduction 
pathway of the vertebrate in question. Any interaction 
between the vertebrate homologue of UNC-53 protein and 
the protein to be tested leads to transcriptional 
activation of a reporter molecule in a GAL-4 
transcription deficient yeast cell into which the 
vectors have been transformed. Preferably, a reporter 
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molecule such as B-galactosidase is activated upon 
restoration of transcription of the yeast galactose 
metabolism genes. This method enables any 
interactions between proteins involved in the signal 
transduction pathway or a parallel or redundant 
pathway to be investigated. 

Any proteins identified in the signal 
transduction pathway of the cell, which may be for 
example a mammalian cell, may also be included in a 
pharmaceutical composition together with a 
pharmaceutical^ acceptable carrier, diluent or 
excipient therefor. 

The present invention also provides a process for 
producing a vertebrate homologue of an UNC-53 protein 
of Ci elegans or a functional equivalent, fragment, or 
derivative of the protein, which process comprises 
culturing the cells transformed or transfected with a 
cDNA expression vector having any of the cDNA 
sequences according to the invention as described 
above, and recovering the expressed vertebrate 
homologue of UNC-53 protein- The cell may 
advantageously be a bacterial, animal, insect or plant 
cell. 

A particularly preferred process for producing a 
vertebrate homologue of UNC-53 protein or a functional 
equivalent, derivative or fragment of said homologue 
comprises using insect cells. Accordingly, the 
invention provides a process for producing a 
vertebrate homologue of UNC-53 protein of C, eleaans 
or a functional equivalent, fragment, derivative or 
bioprecursor of the UNC-53 protein, which process 
comprises culturing an insect cell transfected with a 
recombinant Baculovirus vector, said vector comprising 
a nucleotide vector encoding the vertebrate homologue 
of UNC-53 protein or a functional equivalent, fragment 
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or bioprecursor thereof downstream of the Baculovirus 
polyhedrin promoter and recovering the expressed 
protein. Advantageously, this method produces large 
amounts of protein for recovery- The insect cell may 
5 be from for example fippdoptera frucriperda or 
pr-nfiophila Mplanocrester. 

In accordance with the present invention, a 
defined nucleic acid sequence includes not only the 
identical nucleic acid but also any minor base 
10 variations from the natural nucleic acid sequence 

including in particular, substitutions in bases which 
result in a synonymous codon (a different codon 
specifying the same amino acid) , due to the degenerate 
code in conservative amino acid substitution. The 
15 term "nucleic acid sequence" also includes the 

complimentary sequence to any single stranded sequence 
given which includes the definition above regarding 
base variations. 

Furthermore, a defined protein, polypeptide or 
20 amino acid sequence according to the invention, 

includes not only the identical amino acid sequence 
but also minor amino acid variations from the natural 
amino acid sequence including conservative amino acid 
replacements (a replacement by an amino acid that is 
25 related in its side chains) . Also included are amino 
acid sequences which vary from the natural amino acid 
but result in a polypeptide which is immunologically 
identical or similar to the polypeptide encoded by the 
naturally occurring sequence. Such polypeptides may 
30 be encoded by a corresponding nucleic acid sequence. 

A further aspect of the invention provides a 
nucleic acid sequence of at least 15 nucleotides of a 
nucleic acid according to the invention and preferably 
from 15 to 50 nucleotides. 
35 These sequences may, advantageously be used as 
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probes or primers to initiate replication or the like* 
Such nucleic acid sequences may be produced according 
to techniques well known in the art, such as by 
recombinant or synthetic means. They may also be used 
in diagnostic kits or the like for detecting for the 
presence of a nucleic acid according to the invention. 
These tests generally comprise contacting the probe 
with a sample under hybridising conditions and 
detecting for the presence of any duplex formation 
between the probe and any nucleic acid in the sample. 
Nucleic acid sequences according to the invention may 
also be produced using recombinant or synthetic means 
such as described in Sambrook et al ( Molecular 
Cloning: A Laboratory Manual, 1989) .Advantageously , 
human allelic variants or polymorphisms of the DNA 
according to the invention may be identified by, for 
example, probing DNA libraries from a range of 
individuals for example from different populations. 
Furthermore, nucleic acids and probes according to the 
invention may be used to sequence genomic DNA from 
patients using techniques well known in the art, such 
as the Sanger Dideoxy chain termination method, which 
may advantageously ascertain any predisposition of a 
patient to certain proliferative disorders. 

A method of detecting whether a compound is an 
inhibitor or an enhancer of expression of a vertebrate 
homologue of UNC-53 of C. eleaans f or a functional 
equivalent, derivative or fragment of said vertebrate 
homologue is also provided which method comprises 
contacting a cell expressing said homologue with said 
compound and monitoring for a phenotypic change 
compared to a control cell which has not been 
contacted with said compound. 

Preferably the cell is a transgenic cell as 
described above. Alternatively the cell may have 
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undergone loss of contact inhibition. 

The present method also provides for determining 
whether said compound is an inhibitor of expression of 
said vertebrate homologue. In one embodiment the 
5 compound to be tested comprises a nucleic acid. 

Preferably said nucleic acid sequence comprises 
an antisense DNA sequence or a mRNA sequence. 

Preferably said mRNA sequence comprises 3 1 
untranslated regions of mRNA encoding for said 
10 vertebrate homologue. 

Alternatively , the compound to be tested may be a 
protein. Preferably,- said protein comprises a protein 
having an amino acid sequence potentially suitable for 
inhibiting function of said vertebrate homologue and 
15 preferably comprises a protein identified by the 
methods as described herein. 

The present invention also provides a 
pharmaceutical composition comprising a compound, for 
example an antisense nucleic acid identified according 
20 to the above described method together with a 

pharmaceutical^ acceptable carrier, diluent or 
excipient therefor* 

A nucleic acid sequence or protein identified 
according to this aspect of the invention may be used 
25 as a mediciament, or in the preparation of a 

medicament, for treating loss of contact inhibition or 
cancer which is mediated by a vertebrate homologue of 
UNC-53 protein or a functional equivalent, fragment, 
derivative or bioprecursor of said homologue. 
30 Further provided by the invention is a nucleic 

acid as defined above for use in preparation of a 
medicament for inhibiting expression of a gene coding 
for a vertebrate homologue of UNC-53 protein of 
C. elegans or a functional equivalent, derivative, 
35 fragment or bioprecursor of said homologue. 
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According to a further aspect of the invention 
there is provided a plasmid pCB2 01 deposited under 
LMBP Accession No. LMBP 3 59 4 and a MCF-7 and a NIH/3T3 
cell line transfected with plasmid pCB201 deposited 
under LMBP Accession Nos. LMBP 1601 CB and LMBP 1603 
CB respectively* Further provided by the invention is 
phage lambda 3b coding for Hu-UNC-53/1 and deposited 
under Accession No. LMBP 1604CB (or 3595) . Also 
provided are plasmids pLMl deposited under Accession 
No. LMBP 3762, pLM4 (LMBP 3763), pEGFP72 (LMBP 3764) 
and pCBSOl (LMBP 3765) . Further provided is a Bac 
clone comprising a fragment of hu-unc-53/2 gene (LMBP 
3773) and a worm strain comprising a chimeric 
C.elegans human unc53 gene deposited under LMBP 
Accession No. LMBP - 1663 CB . 

Further provided by the invention is an assay for 
detecting expression of a vertebrate homologue of 
UNC-53 protein of C. elegans in a vertebrate cell 
which assay comprises contacting a cell or an extract 
thereof with an antibody to said vertebrate homologue, 
or a functional equivalent, derivative or bioprecursor 
thereof, which antibody is fused to a reporter 
molecule, removing any unbound antibody and monitoring 
for the presence of said reporter molecule. 

Preferably the reporter molecule is an antibody 
conjugated to for example a flurophore such as 
fluorescein or alternatively to an enzyme such as 
strepavidin. 

There is also provided a method for detecting for 
expression of a gene coding for a vertebrate homologue 
of UNC-53 protein or a functional equivalent, 
derivative, fragment or bioprecursor thereof, which 
method comprises contacting a probe specific for a 
nucleic acid or protein sequence coding for or 
corresponding to said vertebrate homologue or a 
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functional equivalent, fragment or bioprecursor 
thereof with a cell extract, which probe is linked to 
a reporter and analysing for the presence of said 
reporter . 

Preferably the probe is a complementary sequence 
to a region of mRNA transcribed from said gene 
encoding said vertebrate homologue of UNC-53 protein 
or a functional equivalent, derivative or bioprecursor 
therefor. 

Preferably the complimentary sequence is a 3 1 or 
5 1 untranslated region of said mRNA* Preferably said 
reporter may be a dig label, a fluorophore, a hapten 
or a radiolabel. 

Alternatively said probe comprises an antibody 
specific for said vertebrate homologue of said UNC-53 
protein or a functional equivalent, derivative, 
fragment or bioprecursor therefor. 

Preferably the reporter is an antibody conjugated 
to for example a fluorophore such as fluorscein or 
alternatively an enzyme such as streptavidin. 

As described above UNC-53 protein of C.elegans 
has been found to localise to microtubule and 
particularly to microtubule ( + ) ends. Therefore, 
there is provided by a further aspect of the present 
invention a method of determining whether a compound 
is an inhibitor or an enhancer of association of UNC- 
53 or a vertebrate homologue thereof according to any 
of claims to 1 to 9 to microtubules or plus end 
regions thereof, which method comprises (a) contacting 
said compound with a transgenic cell, tissue or 
organism expressing UNC-53 protein or said vertebrate 
homologue and which protein is operably linked to a 
reporter molecule (b) screening for the localisation 
of said reporter molecule as compared to a cell 
according to step (a) which has not been contacted 
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with said compound. 

A compound identifiable by the above method also 
forms part of the present invention. Such a compound 
identified as an inhibitor of localisation or 
association of UNC-53 or said vertebrate homologue 
with microtubules or the plus end region thereof may 
be used in alleviating the spread of disease inducing 
cells or metastasis or loss of contact inhibition. 
Further a compound identified as an enhancer of 
association of UNC-53 or said vertebrate homologue 
with microtubules or the plus end region thereof may 
be used in for example promoting neuronal 
regeneration, revascularisation or wound healing, or 
for treating chronic neurodegenerative diseases or 
acute traumatic injuries or fibrotic disease. These 
compounds may then be included in a pharmaceutical 
composition, together with a pharmaceutically 
acceptable carrier, diluent or excipient therefor. 

Also provided by the present invention is a kit 
for determining whether a compound is an inhibitor or 
an enhancer of association of UNC-53 or a vertebrate 
homologue thereof according to the invention with 
microtubules or the plus end regions thereof, which 
kit comprises at least one transgenic cell expressing 
UNC-53 and a reporter molecule or a host or transgenic 
cell according to the invention and at least one cell 
of the same cell type for use as a control and means 
for contacting said compound with one of said at least 
one transgenic cells. Compounds identified as 
inhibitors or enhancers or microtubule association 
described above may advantageously be included in a 
composition and linked to unc-53 protein of C.elegans 
or a vertebrate homologue thereof according to the 
invention to target the compounds to the microtubules 
or the plus end regions thereof. Such a composition 
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may also comprise, for example , a suitable 
transfecting or transformation agent. 

According to a further aspect of the invention 
there is provided a method of targeting a protein to a 
5 cell microtubule or the plus end region thereof, which 
method comprises introducing into a host cell, tissue 
or organism a transgene comprising a sequence capable 
of expressing UNC-53 or a vertebrate homologue thereof 
according to the invention, which sequence is operably 

10 linked to a sequence encoding said protein to be 

targeted such that a chimeric protein is expressed and 
which results in targeting said protein to said 
microtubule or a plus end region thereof. An even 
further aspect of the invention comprises a method of 

15 identifying a molecule which covalently modifies UNC- 
53 or a vertebrate homologue thereof according to the 
invention, which method comprises a) contacting either 
an extract from a cell or cells expressing UNC-53 or 
said vertebrate homologue or a mixture of enzymes 

20 comprising canditate UNC-53 modifying enzymes in the 

presence of an indicator of covalent modification of a 
protein, b) identifying any covalently modified UNC-53 
protein from step a) and c) identifying said molecule 
involved in said modification step. Such an indicator 

25 may be * : p. 

Further provided by the invention is a method of 
identifying a compound which alleviates or enhances 
the toxicity of UNC-53 or a vertebrate homologue 
thereof according to the invention, or which 

30 alleviates or enhances apoptosis. The method of the 
former comprises contacting said compound with a 
transgenic cell, tissue or organism according to the 
invention and monitoring for the presence of said 
reporter molecule adjacent said microtubules or the 

35 plus end regions thereof. In the case of apoptosis the 
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method comprises monitoring the effect of the compound 
on cell death ♦ 

The invention may be more clearly understood from 
the following examples which are only exemplary, with 
5 reference to the accompanying drawings wherein 

Figure 1 illustrates the sequence of plasmid 
pTB72 which codes for the full length UNC-53 protein 
in C, elegans f deposited under LMBP Accession No. 3486. 
Figure 2 illustrates the full-length UNC-53 
10 protein from C. elegans . 

Figure 3 is a Tblastn search of the EST division 
of Genbank with the ORF of the longest known Ce-UNC-53 
cDNA. tb3-M5, reveals two EST 1 s with homology to a 
predicted coiled-coil region in Ce-UNC-53. 
15 Figure 4 illustrates a search of the Genbank 

databases with part of the nucleotide binding domain 
of Ce-UNC-53. It does not identify statistically 
significant proteins except for the C. elegans cosmid 
containing Ce-unc-53 . 

2 0 Figure 5 illustrates a three frame translation of 

EST gb:R41071. 

Regions of homology with Ce-Unc-53 in two 
different frames are underlined. The spacing between 
the blocks of homology is of similar size to that in 

25 Ce-UNC-53. Subsequent re-cloning and re-sequencing of 
this region in man identified multiple sequencing 
errors gb:R4107l, and identified an ORF which is more 
homologous to and co-linear with Ce-UNC-53 (see 
alignment in fig. 12) . 

30 Figure 6 is a BLASTN search of the EST division 

of Genbank with Hu-unc-53/1 cDNA cosmid 3b. 

Figure 7 is a TBLASTN search of the Genbank 
sequence database with the 961 amino acid ORF of cDNA 
3b of hu-UNC-53/l : hu-UNC-53/1 forms a unique pair 

3 5 with Ce-UNC-53 (cosmid F4 5E10) compared to the rest of 
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the database. 

Figure 8 is a diagram illustrating the length and 
overlap and tissue source of the different cDNA clones 
of the 3* end of Hu-UNC-53/1 isolated in this work. 

Figure 8a. is a diagram illustrating the further 
sequence of the Hu-UNC-53/1 and overlap of constructs 
to obtain the further sequence. 

Figure 8b is a diagram illustrating the 3 1 end 
of HU-UNC53/1 and the EST clones present in the 
database. 

Figure 9a is an annotated sequence listing of 

Clone 3b nf hn — TTMP— R1 /1 inrlnHi ryrv 4-H<=> TT^^r> n 4 _i 

— / — — - — — . w**w ^wwaxo. ywxjrx Xii/vci 

GAATTC . The predicted Open Reading Frame of Hu-UNC- 
53/1 is listed below the sequence. Blocks A B C D and 
E which are similiar to Ce-UNC-53/1, a region which is 
different between Hu-UNC-53/1 and Hu-UNC-53/2 and the 
3' untranslated leader sequences are marked with 
arrows and labelled. 

Figure 9b is an annotated sequence listing of 
Hu-UNC-53/l available at this moment. The predicted 
Open Reading Frames of Hu-UNC-53/1, pLMl, pLM3, pLM4, 
pCB251, pLM5 and pCB201, the homology blocks A, B, C, D 
and E, the position of a region which is different 
between Hu-UNC-53/1 and Hu-UNC-53/2, the position of 
phhl4-3 f pCB212, pCB210-14, phh3b, phhl5, the position 
of the reverse primers HU53rvl, HU53rv2, HU53rv3 and 
HU53rv4, the position of peptides B72628 (=28/1), 
B72627, B72626 and B72625 are listed below the 
sequence. 

Figure 10 is an annotated sequence listing of the 
insert of clone gbAA049124 (EST479167) of mu-UNC-53/1. 
The open reading frame and 3 ' untranslated sequence is 
marked with an arrow. 

Figure lla is an annotated sequence listing of 
the insert of clone gbH09036 (EST46037) of Hu-UNC- 
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Figure lib is a novel DNA sequence of HU-UNC-53/2 
extended by RT-PCR. This DNA sequence is not present 
in EST-4 6037 and extends the ORF beyond position 1109 
of Figure 11a to an ORF from position 18 to 1793. 

Figure 11c summarises how the 3 1 and 5' 
extensions of hu-unc-53/2 were made. 

Figure lid compiles the sequence of hu-unc-53/2. 
The boxed sequences are the primer sequences used for 
the respective extension steps described in the 
experimental methods section. 

Figure lie illustrates the sequences of the 
extensions summarised in figure 11c. 

Figure llf illustrates the sequence information 
illustrating four alternative Start sites observed for 
hu-unc-53/2 . 

Figure 12 . is an illustration of a Tblastn search 
of the EST division of Genbank with 680aa starting at 
the C-teminus of the alpha actinin domain of 
hu-unc-53/2 . 

Figure 12a. is an illustration of an amino acid 
alignment of the available sequence of C.elegans 
unc-53 and hu-unc-53/1 and hu-unc-53/2. 

12b. is an illustration of similarity plots for 
Ce-unc-53 and hu-unc-53/1 (top) and for hu-unc-53/1 
and hu-unc-53/2. 

Figure 13 is an annotated sequence listing of 
expression vector pCB201 containing homology block E 
from Hu-UNC-53/1 cloned in a pcDNA3 . 1-HIS expression 
vector. The HIS and T7-tags, PGR primer used to 
modify hu-UNC-53/1 and ORF are marked. 

Figure 14 is a diagram showing the alignment of 
the homologous regions of hu-UNC-53/1 and mu-UNC-53/1. 

Figure 15 is an annotated sequence listing of 
expression vector pCDU3 containing part of Ce-UNC-53/1 
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cloned in expression vector pcDNA3.1, The upper ORF 
starts in the vector polylinker. The lower ORF starts 
at the first Methionine and is part of Ce-UNC-53/1 • 
Figure 16 is an annotated sequence listing of 
5 expression vector pCDU4 containing part of Ce-UNC-53/1 
cloned in expression vector pcDNA3.1. The upper ORF 
starts in the vector polylinker. The lower ORF starts 
at the first Methionine and is part of Ce-UNC-53/1. 
Figure 17 is an annotated sequence listing of 

10 expression vector pCDU2 containing part of Ce-UNC-53/1 
cloned in expression vector pcDNA3 . 1 . The upper ORF 
starts in the vector polylinker. The lower ORF starts 
at the first Methionine and is part of Ce-UNC-53/1, 
Figure 18 illustrates MCF-7 cells transfected 

15 with pCB201 (upper) compared to mock transfected MCF-7 
cells (phase contrast image) . The control cells are 
spread out on the tissue culture plastic and 
exhibiting few filopodia outgrowths. The transfected 
cells appear smaller because they are slightly rounded 

20 up and have multiple filopodia outgrowths per cell 
(arrowheads) . 

Figure 19 is a phase contrast image of MCF-7 
cells, transfected with pcDNA3 . 1 (19a), pCDU4 (19b), 
pCDU3 (19c), pCDU2 (19d) and pTB72 (19e). 

25 Figure 20 is an F-actin pattern (visualized with 

TRITC-Phalloidin) of MCF-7 cells transfected with 
pcDNA3.LacZ (top panel) and with pCB201 (middle and 
lower panel) . 

Figure 21 is an F-actin pattern Phalloidin 

30 (visualised with TRITC-Phalloidin) of MCF-7 cells 

transfected with pCDNA3 . 1 (21a), pCDU4 (21b), 
pCDU3(21c), pCDU2 (21d) and pTB72 (21e). 

Figure 22 is a phase contrast image of N4 
neuroblastoma cells transfected with pcDNA3 . 1 (22a), 

35 pCDU4 (22b), pCDU3 (22c), pCDU2 (22d) andpTB72 (22e) . 
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Figure 23 is an F-actin pattern Phalloidin 
(visualised with TRITC-Phalloidin) of N4 neuroblastoma 
cells transfected with pcDNA3 . 1 (23a), pCDU4 (23b), 
pCDU3 (23c) , pCDU2 (23d) andpTB72 (23e) . 

Figure 24 illustrates phase contrast images of 
small (top) , medium (middle) and large foci (bottom) 
induced in a monolayer of NIH3T3 cells by transfection 
with pCB201. 

Figure 25(c) illustrates human metaphase 
chromosomes probed with a probe lp34 and figures 25a 
and 25b indicating the chromosomal location of hu-UNC- 
53/1 in lq31. Essentially the same techniques were 
used to assign the gene hu-unc-53/2 to chromosome 
locus llpl5 (25d and e) as illustrated in micrograph 
25f • 

The ideograms 2 5a and 2 5d are from the 
International System for Human Cytogenic Nomenclature 
1985, The ideograms 25b and 25e in which the relative 
band positions and arm ratios were derived from actual 
chromosome measurements is from Cytogenet Cell Genet 
65:206-219 (1994). 

Figure 2 6 is an expression pattern of HU-Unc53/l 
and HU-Unc53 2 in normal human tissues and cancer cell 
lines . 

Figure 27 is a sequence map of Plasmid pNP3 . 

Figure 28 is an examplary list of prosite 
signatures which can be used to define and identify 
vertebrate homologues of UNC-53. 

Figure 29 is a annotated sequence map of plasmid 
pEGFPsac. The GFP-C . elegans unc53sac fusion protein, 
and the C. elegans unc53 sac fragment are indicated. 

Figure 30 is a sequence map of plasmid pEGFP72. 
The GFP-C. elegans unc53 fusion protein and the 
C. elegans unc53 fragment are indicated. 

Figure 31 is an annotated sequence map of plasmid 
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pEGFPsma . The GFP-C. elegans unc53sma fusion protein, 
and the C.e.unc53 sma fragment are indicated. 

Figure 3 2 is an annotated sequence map of plasmid 
pEGFPecl. The GFP-C. elegans unc53ecl fusion protein, 
and the C. elegans unc53 eel fragment are indicated. 

Figure 3 3 is an annotated sequence map of plasmid 
pEGFPxba. The GFP-C. elegans unc53xba fusion protein, 
and the c. elegans unc53 xba fragment are indicated. 

Figure 3 4 is an annotated sequence map of plasmid 
pLM4 . Open reading frames of the hul-unc53/l and GFP 
are indicated. 

Ficrure 3 5 is a seouence man of nlasmi'H r»MPft . 

-* -» A. *T ' £ — — - 

Figure 36 is an illustration of microtubule 
association of C. elegans Unc53, shown in HepG2 cells, 
transiently transfected with pTB72, expressing 
C. elegans Unc53. panel A: microtubule staining of 
HepG2 cells, using VL1/2 panel B:C elegans Unc53 
staining, using rab4 . 

Figure 37 is an illustration of microtubule plus- 
end association in human cell lines transiently 
transfected with pTB72, expressing C.e.Unc53. 
C. elegans Unc53 was stained with mab-16-48. Panel C: 
COS cells showing microtubule association panel B: 
MCF7 cells showing microtubule plus-end association 
panel A: HepG2 cells showing microtubule plus-end 
association. 

Figure 3 8 is an illustration of microtubule 
association in N4 cells transiently transfected with 
pEGFP72, expressing the GFP-C. elegans Unc53 fusion 
protein. GFP fluorescence was observed in living 
cells. Panel A: microtubule association of the GFP- 
C. elegans unc53 fusion protein panel B: microtubule 
plus-end association of the GFP-C . elegans unc53 fusion 
protein. 

Figure 39 is an illustration of microtubule 
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association in N4 cells transiently transfected with 
pEGFP72, expressing the GFP-C . elegans Unc-53 fusion 
protein. Microtubules were stained with YL1/2 after 
paraformaldehyde fixation. Panel A: Microtubule 
association of the GFP-C. elegans unc53 fusion protein. 
Panel B: tubuline staining. Panel C: panel A plus 
panel B: co-localisation of the GFP-C. elegans unc-53 
fusion protein and Tubuline can be seen as yellow. 

Figure 40 is an illustration of microtubule 
association in N4 cells, transiently transfected with 
pEGFPsma, expressing the GFP-C. elegans unc53sma fusion 
protein. Panel A: Microtubule association of the GFP- 
C. elegans unc53sma fusion product. Panel B: Centriole 
association of GFP-C. elegans unc53sma fusion product 
when expressed at low levels. 

Figure 41 is an illustration of microtubule 
association in N4 cells, transiently transfected with 
pEGFPecl, expressing the GFP-C. elegans unc53ecl fusion 
protein. Panel A: Microtubule association of the GFP- 
C. elegans unc53ecl fusion product. Panel B: Centriole 
association of GFP-C . elegans unc53ecl fusion product 
when expressed at low levels. 

Figure 42 (a) /Figure 42(b) are illustrations of 
fluorescence of GFP in N4 cells transiently 
transfected with pEFPxba and pEFGPsac respectively. 

Figure 43 is an illustration of microtubule 
association of in N4 cells transiently transfected 
with pLM4 expressing GFP-Hu-UNC53 / 1 fusion protein. 
Panel A: microtubule association of GFP-HU-UNC53 / 1 
fusion protein. Panel B: microtubule plus-end 
association of GFP-HU-UNC53 / 1 fusion protein. Panel 
C: microtubule association of GFP-HU-UNC53 / 1 in 
dividing cells (end of division) . 

Figure 44 is an illustration of the sequence of 
Plasmid pNP9. 
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Figure 4 5 is an illustration of immuno 
fluorescence in melanoma G361 cells stained with sera 
28.1. Panel A: Microtubule plus-end association of 
Hu-UNC53/1. Panel B: microtubule plus-end association 
of hul-Unc53 in growth cone extensions. 

Figure 4 6 is an illustration of GFP fluorescence 
and immunofluorescence in N4 cells transiently 
transfected with pLM4 , and stained with sera 28.1. 
Panel A: Fluorescence of GFP-Hu-UNC53/1 fusion 
protein. Panel B: Immunofluorescence of serum 28.1. 

Figure 47 is an overview of the microtubule (+) 
end- the microtubule and f— act in cytoskeleton binding 
properties of different constructs 

Figure 50 is an illustration of rescue of lateral 
ALN neurons in mutant unc-53. 

Dorsal view of the ALN neurones axones visualise 
in GFP fluorescence with the transgene pA/GFP in the 
posterior of an adult, (c) cellular body. 

a) wild type, anterior axon (aa) migrates in a 
straight line along the body until reaching the head, 
on the dorsal sublateral cord, posterior axon (ap) 
migrates into the tail; 

b) unc-53 (nl52) , anterior axons are the shorter, stop 
ahead of the vulva region and form numerous collateral 
branches towards the dorsal cord; 

c) unc-53 (nl52) ,pA/unc-53 anterior axons no longer 
form branches and migrate in a straight line into the 
head, as in the wild type at a) . 

scale bar 10 /im. 

Figure 51a : is an illustration of chimeric 
fusion between C. elegans and human 1 homologue of the 
unc-53 gene. The region of the putative nucleotide 
binding domain (NTP) is replaced in the C. elegans 
cDNA by the same region of the human homologue 1 of 
unc-53 (HI) . The cDNA is under the promotor region A 
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(pA) of unc-53, which raise expression in the ALN 
lateral neurons. 

Figure 51b : is an illustration of the chimeric 
minigene nematode /human pA/unc-53-Hl partially rescue 
5 the defect in the longitudinal migration of the 
lateral neurons ALN and PLN . The four strains 
compared are : wt; unc-53 (nl52) ; unc53 
(nl52) ,pA/unc-53; unc-53 (n!52 ), pA/unc-53-Hl . The 
observed phenotypes are put in three classes : 

10 %sauvage=n , the axon is straight, unbranched, and 

migrates until the head; ^vulve^ , the axon is 
straight, unbranched, and stops in the vulva region; 
^mutant^ , the axon is short, never joints the vulva 
region and made a lot of collateral branches. Numbers 

15 are in percentage. The number of observed axons are 

noted in the last column. The chimeric fusion between 
the C. elegans gene and human homolog (unc-53-Hl) 
partially rescues the mutant phenotype. The chimeric 
gene was maded by replacing the putative nucleotide 

20 binding region (NTP)of the nematode cDNA by the same 
region of the human homolog 1 (HI) . 

Figure 52 is an illustration of the sequence for 
plasmid pLM5. 

Figure 53 is an illustration of the sequence for 

25 plasmid pLM6. 

Figure 54 is an illustration of the sequence for 
plasmid pLMl. 



Figure 


55 


is 


a 


sequence 


map 


of 


plasmid 


pCB251. 


Figure 
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is 
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sequence 


map 


of 


plasmid 


pNPlO. 


Figure 
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is 
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sequence 


map 


of 


plasmid 


pCB501. 


Figure 


58 


is 
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sequence 


map 


of 


plasmid 


pTB115. 


Figure 


59 


is 
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sequence 


map 


of 


plasmid 


pPD9 5.75. 


Figure 
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is 
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sequence 


map 


of 


clone X16. 


Figure 


61 


is 
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sequence 


map 


of 


plasmid 


pLM3 
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DEPOSITED MATERIALS 





Deposit 


Date 


Acc. Nr 




pCB201 plasmid DNA in E. coli 


3 December 1996 


LMBP 3594 


5 


Lambda clone 3B encoding hu-unc-53/1 


3 December 1996 


LMBP 3595 




MCF-7 clone z4 (mock) 


3 December 1996 


i mrp ifinnpR 




MCF-7 clone (pCB201) 


3 December 1996 


LMBP 1601CB 




Mill axo i 

NIH-3T3 mock 


3 December 1996 


LMBP 1602CB 




NIH-3T3 pCB201 


3 December 1996 


LMBP 1603CB 


10 


pLM1 


13 November 1997 


LMBP 3762 




pLM4 


13 November 1997 


LMBP 3763 




PEGFP72 


13 November 1997 


LMBP 3764 




pCB501 


13 November 1997 


LMBP 3765 


15 


BAG clone comprising fragment of hu- 
unc53/2 gene 


15 November 1997 


LMBP 3773 




Worm strain with chimeric 
C.elegans/human unc-53 gene 


15 November 1997 


LMBP-1663CB 



The above plasmids and cell lines were deposited 
at the Belgian Coordinated Collections of 
Microorganisms (BCCM) at laboratorium voor moleculaire 
biologie - plasmidencollective (LMBP) B9000, GENT, 
Belgium, in accordance with the provisions of the 
Budapest Treaty of 28 April 1977. 

The present invention will now be described with 
reference to the following examples which are not 
limiting. 

Identification of a hum an homoloaue of the UNC-Ri 

protein of c. elegant, 

Extensive searches with the ce-UNC-53 sequence 
(Figures 1 and 2) against the public domain databases 
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(EST, Genbank, EMBL, Swissprot and PIR) revealed no 
statistically significant homologies (a smallest sum 
probability (ssp) of 10 e - 8 is generally accepted to 
be significant at amino acid level) . Two ESTs 
gbH09036 (ssp = 1.1 e - 5) a Homo sapiens cDNA clone 
and gbAA049124 (ssp=8.6-5) a mouse cDNA clone showed 
homology to a "coiled coil" region a common motif in 
the contributing to protein secondary structure. 

(figure 3) 

All other candidate scores were are at background 
level (ssp >0.21). Careful examination of weak 
candidate ESTs identified EST gb:R41071 from Homo 
sapiens, which had obtained a low score of 53 and a 
non-significant probability score of 0.33 (Fig. 4), 
The inventors surprisingly discovered potentially 
significant homology with the Ce-UNC-53 nucleotide 
binding domain, provided multiple frameshifts and 
sequence errors were hypothesized. 

The inventors amplified, cloned and sequenced 
part of gb:R41071 from human heart and human lung cDNA 
and from human genomic DNA and discovered that clone 
gb:R4107l had up to ten 10 different mistakes in the 
region checked* 5 extra nucleotides were scattered 
along its sequence and two nucleotide substitutions 
were identified, and gb:R41071 lacked three 
nucleotides present in our clone (Fig. 5) . The novel 
sequence obtained was two nucleotides shorter and 
showed the two UNC-5 3 -homologous regions in frame. 
The genomic fragment obtained is larger (700 bp total 
length) than the corresponding cDNA clones indicating 
the presence of an interverting sequence of around 500 
bp in nucleotide 162 of this fragment. The amplified 
cDNA fragment which was cloned to vector PCRII 
(Intvitrogen) and named pCR231 and was used as a probe 
to screen cDNA libraries. 
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The conceptual translation of the clones we 
obtained by PCR were screened using blast and tblastn 
against all known protein and DNA sequences in the 
database. The only clone which came up with 
statistically significant similarity was Ce-UNC-53 
(Fig. 6). This human clone and Ce-UNC-53 thus form a 
unique homologous pair compared to the rest of the 
known sequences, indicating the statistical relevance 
and novelty of our discovery. We designate this human 
gene as hu-UNC-53/1. Human heart and a human 
colorectal adenocarcinoma cDNA libraries were probed 
with pCR2 3 1 probe to identify longer cDNA clones. The 
clones overlap giving a linear sequence of 3706 bp 
(Fig 8 and 26) . This sequence shows an 959 amino acid 
open reading frame from the beginning of the clone. 
The absence of a 5 1 untranslated region suggests that 
the mRNA will extend 5 1 . 

Sequence alignment searches of the public domain 
databases with the DNA sequence of hu-UNC-53/1 and 
its' conceptual translation identified a series of 
ESTs most of which correspond to the 5' UTR region. 
(Figures 7 and 8) . Surprisingly, hu-UNC-53/1 
identified also the cDNA clones gbH09036 and 
gbAA049124 homologous to the predicted coiled coil 
region in Ce-UNC-53 hu-UNC-53/1, and furthermore 
identified a third weakly homologous EST gbR21023. 
The inserts of gbH09036 / gbAA049124 and gbR21023 were 
obtained from the Merck consortium and sequenced. 

gbAA049124 is >95% identical to Hu-UNC-53/1 over 
604 available amino acids (fig. 10) and is the mouse 
orthologue of Hu-UNC-53/1. The insert in gbH09036 is 
clearly homologous to hu-UNC-53/1 but derived from a 
different locus. We therefore name the gene 
identified by gbAA049124 Mu-UNC-53/1 and the gene 
identified by gbH09036 Hu-UNC-53/2. (Figure 11). 
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5 domains of high similarity mark the unc-53 gene 
family 

5 Ce-UNC-53 and the here-identified vertebrate 

homologues form a unique novel protein family, that is 
distant from the remainder of the proteins in the 
public domain. Alignment of the predicted open 
reading frames shows that Hu-UNC-53/1 and Hu-UNC-53/2 

10 are equidistant from Ce-UNC-53. The highest homology 
is found in the carboxyterminal amino acids of Ce-UNC- 
53 region. The presence of a conserved GXXGKS/T box 
suggests a nucleotide binding function. However, this 
domain as a whole does not belong to a class of known 

15 nucleotide binding proteins* 

The similarity amongst the presently known 
sequence of the UNC-53 family of proteins is highest 
in 5 blocks over most of the available sequence (959 
amino-acids) and a firther block identified in Figure 

20 12a. These blocks can be assigned signature sequences 
as displayed in figure 28 or can be assigned weight 
matrices based on the alignment between the different 
family members. By using truncated constructs of Ce- 
unc-53, the functional relevance of these domains has 

25 been addressed* 

HU-UNC53/1 and Hu-UNC-53/2 are complex 
transcription units • 

30 1. A cancer cell line RNA blots probed with HU- 

Unc53/1. 

A Northern blot of poly-A+RNA from several 
cancer cell lines (Melanoma G361, Lung Cancer A549, 
Colorectal Adenocarcinoma SW480, Burkitt Lymphoma 
35 DRajii, Leukemia Molt4, Lymphoblastic Leukemia K562, 
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HeLa S3 and Promyelocytic Leukemia HL60) was probed 
using the whole insert of pHH3b. No or weak 
expression was detected in the Burkitt Lymphoma 
DRajii, the Leukemia Molt4 and the Promyelocytic 
5 Leukemia HL60 cell lines . Five different transcripts 
are detected in the remaining cancer cell lines: 
transcripts l and 2 are larger than 9.5kb, transcripts 
3 and 4 are 6 to 7 kb and the fifth transcript is 
around 6 kb. Transcripts 1 and 2 are present in all 

10 experssing cell lines . Transcripts 3 and 4 are 

restricted to Melanoma G361, Lung Cancer A549 and 
Colorectal Adenocarcinoma SW480 and are the 
predominant transcripts in Melanoma G361 and 
Colorectal Adenocarcinoma SW480. Transcript 5 is 

15 restricted to Lymphoblastic Leukemia K562 and HeLa S3 
and is predominant in HeLa S3. 

2. Cancer cell lines RNA blots probed with HU- 
UNC-53/2. 

20 A similar set of cancer cell line Northern 

blots were probed with a 652bp fragment of EST4 6037 
amplified by using the primers 5 1 - 

aggagatgaagctgacagatatcc and 5 * -aaacaccagtgagtcc . HU- 
UNC-53/2 is expressed in Melanoma G361, Colorectal 

25 Adenocarcinoma SW480, Lymphoblastic Leukemia K562 and 
HeLa S3. No expression was detected in Lung Cancer 
A549, Burkitt Lymphoma DRajii, Leukemia Molt4 and 
promyelocytic leukemia HL60. Interestingly only 2 
transcript sizes were detected of around 7 kb 

30 expressed in Lymphoblastic Leukemia K562 and HeLa S3 
and a transcript of >9.5 kb in Melanoma G361 and 
Colorectal Adenocarcinoma SW480. 

3. Normal Human tissue probed with HU-Unc53/l. 
3 5 A Northern blot of poly-A+RNA from normal 
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35 



stlne ' ovar y and prostate. Expression <„ 

peripheral blood leukocvt. , expression in 

„, . leuxocyte, lung, liver, kidnev 

spleenis barely detectable. ' 

«• Normal Human tissue probed with Hu-unc 5 3/2. 

«,„ - * Similar set °* were probed with . 

rimersT*" ° f EST «°" "P»«- usino tt 
primers 5 -aggagatgaagctgacagatatcc and 5 .. 
aaacaccagtgagtcc. Expression levels are low in all 

s h s e u a e :t wi :L the t hi9hest isvei in w — l ~ ^ 

in heart, placenta, lung, skeletal muscle ana 

Expression is bareiy •*■*» 

=learIv e h - U :r 53/1 hU -™ C -»' 2 ^-logues are 

tissl 9 7 re3Ul " ed «•»-. lowing a strong 
tissue specificity and, probably, additional 
mechanisms of regulation ,ie differential splicing of 
from *1 Pr °" 0te -» " *» different proteins derived 
from RNA s identified by probe hhis presumably share 

c. 6 ,™^ yter " inal nUcleotide M-Iin, domain. 
Ce-UNc-53 was shown to be a complex genetic locus and 
complex transcription unit. The different transcripts 
are thought to be a mechanism to assure the necessary 
specificity and functional diversity of this signal 
transduction pathway, with respect to different 
signals and receptors, different tissues and different 
directions of migration. The occurance of a new 
transcript or the observed changes in expression 
levels m the cancer cell line blot suggests a role 
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for hu-UNC-53/1 and hu-UNC-53/2 in the establishment 
or maintenance of the transformed state of those 
cells. 

Phenotypic changes in cells transfected with the 
Nucleotide Binding Domain of ce-UNC-53/1 and Hu-UNC- 
53/1 

Ectopic expression of full length Ce-UNC-53 in C. 
elegans, murine neuroblastoma cells or human MCF-7 
breast-carcinoma cells, has been found to lead to 
increased f ilopodia outgrowth and increased motility 
(unpublished) . The structure of Ce-UNC-53 protein is 
reminiscent of that of large kinases or dynamin where 
a catalytic domain is postively or negatively 
regulated by domains that interface with signal 
transduction pathways for example (by by GRB2 binding, 
phosphorylation or the like) . The inventors therefore 
decided to test whether the nucleotide domain by 
itself is capable of inducing the observed changes in 
the microfilament cytoskeleton and motile or ruffling 
behaviour. 

cDNA fragments coding for the nucleotide binding 
domains of Ce-UNC-53 and Hu-UNC-53/1 were cloned in 
mammalian expression vectors with the CMV promoter 
(see experimental procedures) . 

To be able to detect expression from pCB201 (Fig. 
13), an N-terminal his and a T7 epitope tag were fused 
in frame with the hu-UNC-53/1 cDNA hhl5. pCDU3 
contains a larger fragment of Ce-UNC-53 and starts 
just before the conserved "VI ELK I EL" domain (Fig. 12). 

The empty pcDNA3 vector or pCDNA3 . 1-His-LacZ, a 
mammalian expression vector for E. coli Beta- 
galactosidase, was used as a control vector (mock 
transfection) • The differences between mock and 
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transfected N4 and MCF-7 clones were analysed using 
phase-contrast and Nomarski microscopy coupled with 
time lapse analysis, phagokinesis and immunocyto- 
chemical characterisation of the F-actin. 

PhgnotVPlC Changes in mouse N4 neu roblastom a 

cells 

N4 neuroblastoma cells were stably transfected 
with control construct pCDNA3 . 1 and the C. eleaans 
UNC-53 constructs pTB72, pCDU2 , pCDU3 and pCDU4 . The 
population of clones transfected with the empty 
expression vector were homogeneous and similar to wild 
type N4 cells. In contrast thereto, 1/4 to 50% of the 
clones transfected with pTB72, pCDU2 , pCDU3 and pCDU4 
(see experimental procedures and Figs. 1,17,15 and 16 
respectively) had distinct phenotypes: 

1. Wild type or N4 cells transfected with 
pcDNA3, designated as mock transfection show a central 
cell body, with extensions, designated as neurite 
outgrowths. Less than 5% of the population have 
lamellae. When present, they are generally situated 
on the cell body and on the opposite site of the 
neurite extensions (figure 22a). The lamellae show a 
radial actin spike pattern. Limited branching of the 
actin fibres is observed in wild type or pcDNA3 
transfected N4 cells. Side branches are smaller and 
can be clearly distinguished from the main actin 
branch (figure 23a) . 

2. N4 cells, stably transfected with pCDU4, 
harbouring the homology block E, show an overall 
morphology which is similar to that of wild type N4«s 
(a cell body with neurite outgrowth) . They exhibit 
however an increased frequency and level of lamellae 
formation (figure 22b) • These lamellae, which contain 
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F-actin microspikes are found on both the cell body 
and the neurite outgrowth (figure 2 3b) . Wild type N4 
cells, in contrast thereto, rarely exhibit lamellae on 
the neurite outgrowths* 
5 3 . N4 cells, stably tranfected with pCDU3 , 

encoding for homology blocks C, D and E, show an even 
higher level of lamellae formation labelled with 
TRITC-phalloidin, the cells appear surrounded with F- 
actin fibres, consisting of bundles of F-actin 

10 microspikes (figure 23c) . The presence of these 
lamellae has completely modified the general 
appearance of the cells * They appear flatter and in 
90% of the population, it is not possible to 
distinguish between the cell body and the wide neurite 

15 as they flow gradually into one another (figure 22c) . 
If wild-type-like thin neurite-like outgrowths are 
present, they are frequently numerous, branched and 
located all around the cell. 

4. The overall morphology of N4 cells, stably 

20 transfected with pCDU2, encoding for homology blocks 
A, B, C, D, and E, resembles that of the wild type 
cells since, cell body and neurite outgrowth can be 
clearly distinguished* The pCDU2 transfected cells 
however show more neurite outgrowth, and these are 

25 long and very branched, especially at the end of the 

outgrowth. When neurite outgrowths of different cells 
make contact, increased branching can be observed, 
giving the appearance of a network (figure 22d) . N4 
cells, transfected with pCDU2 , show bundles of long 

30 radial F-actin filaments (microspikes), which can be 
branched, especially apically. The space between the 
hand-shaped actin spikes is mostly filled in with 
actin, leading to small lamellae-like structures. 
Also the network-like branching between the cells 

3 5 shows both the bundled actin structures and the 
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lamellae-like fill-in features. These dense F-actin 
structures are sometimes seen on the cell body, which 
enhances the network-like appearance of the cells 
(figure 23d) . 

5. N4 cells, stably transfected with plasmid 
pTB72, encoding the full length C. eleaans UNC53 
protein, seem to have a more rigid structure than wild 
type cells, most clearly seen as spindle-like and 
triangle-like cells. The corners of these cells show 
an increased level of hand-like lamellae structures. 
This specific phenotype is best seen when the cells 
are grown at low density (figure 22e, Fig. 23e) . 

Phenotvpic change s in hum an breast carcinoma MCF- 
7 cells 

MCF-7 cells were stably transfected with the 
pTB7 2, pCDU2, pCDU3 , pCDU4 and pCB201. The population 
of clones transfected with the LacZ-expression vector 
were homogeneous and similar to wild type MCF-7 cells. 
In contrast thereto, -3 0-50% of the clones transfected 
with pTB72, pCDU2, pCDU3 , pCDU4 and pCB201 had 
distinct phenotypes which were analysed as above for 
the N4 cells: 

1. Wild type and mock (pcDNA3) transfected MCF-7 
cells are heteromorph. In general they are round 
cells or clusters of cells surrounded by lamellae. 
Bulges, similar to thick filopodia, can be observed 
(figure 19a) . When the cells are stained with FITC- 
or TRITC coupled phalloidin, F-actin actin stress 
fibres can be observed, often in rings surrounding the 
cell body (figure 20a & 21a) . When cells are round up 
like this actin is present at the edge of the cell 
body. Less than 10% of the cells display filopodia 
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filled with radial F-actin microspikes. In time-lapse 
analysis the cells are highly quiescent with limited 
ruffling at the edge of the cell, 

2. MCF-7 cells transfected with pCDU4 , encoding 
for homology block E, show two major phenotypic 
differences compared to the wild type cells. These 
cells are more flat and have more extended 
lamellipodia leading to a pancake-like appearance. 
Some clones show more filopodia than wild type (figure 
19b) . Radially organised F-actin fibres can clearly 
be observerd in the lamellae surrounding the cells. 
These stress fibres resemble the wild-type structures, 
but have a more radial than circular orientation. In 
the filopodia, one can observe an increase of 
apparently unorganised, bundles of actin patches 
(figure 21b) . 

3. MCF-7 cells, stably transfected with pCDU3 , 
encoding the homology blocks C, D, and E, shows a 
strikingly different and constant morphology. The 
cells appear smaller than wild type because they are 
more rounded up. All the cells have more filopodia, 
surrounding the cell body (figure 19c) . 
Morphologically these filopodia have the same "hand- 
like" appearance as those observed in N4 neuroblastoma 
cells. Such filopodia are hardly ever observed in 
mock transfected MCF-7 cells. These filopodia are 
filled with F-actin fibres. Compared to wild type 
cells, fine actin stress fibres are decreased (figure 
21c) . In time-lapse analysis single cells as well as 
clusters of cells can be seen to ruffle much more 
dynamically than single or clusters of wild type 
cells. The "half -life" of a filopodia outgrowth on 
the cell surface is much shorter in transfected cells 
and the numbers of filopodia present at any time 
higher. 
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4. Cells transfected with pCB201 (which is 
structurally similar to pCDU4 but human) has a 
phenotype that is nearly indistinguishable from that 
of cells transfected with pCDU3 except that the 
observed phenotype and ruffling activity and filopodia 
outgrowth is even higher than pCDU3 (figure 18). 

5. The overall morphology of the MCF-7 cells 
transfected with pCDU2 , which encodes the homology 
blocks A, B, C, D and E, resembles that of the pCDU3 
transfected cells. The cells are more rounded up and 
show more filopodia than the wild type and mock 
transfected cells (figure 19d) . The filopodia, which 
are all around the cell body tend to be longer, and 
show a difference in actin organisation. The small 
filopodia have the same actin bundles as seen in the 
pCDU3 transfected cells. In the longer filopodia, the 
actin bundles are more parallel, and radial to the 
cell body (figure 2 Id) . 

6. MCF-7 cells transfected stably with pTB72, 
encoding the full length UNC53 protein, are extremely 
rounded up, and tend to adhere more than wild type 
cells. The cells grow in clusters with sausage- or 
tube-like shapes. The presence of large extremely 
thin lamellae with a surface area of more than three 
times the central cell body forms a second 
morphological feature, unique for the pTB72 
transfected MCF-7 cells (figure 19e) . These sheets 
are difficult to observe under a phase contrast 
microscope, but are very clear when stained with 
phalloidin. The lamellae protrude from one side of a 
cell or group of cells and are filled with thin long 
criss-crossing actin fibres, different from "giant" 
wild type MCF-7 cells (figure 21e) . 

These experiments lead to the following set of 
conclusions: (Figure 47 summarises the data of the 
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domain swapping experiments in C. elegans unc-53) 

1. Murine and human cells transfected with the 
Ce-UNC-53 or hu-UNC-53/1 domains show clear effects on 

5 the nature and dynamics of their motile behaviour as 
demonstrated by changes in the F-actin cytoskeleton 
(the increase in lamellipodia, hand-like filopodia and 
"hair-like" microspikes on the cell surface and the 
associated reduction of the "rings of F-actin w stress- 
10 fibres) . 

2. This effect is found in two cell types of 
different species and tissue origin: MCF-7 cells 
(human breast carcinoma cells of epithelial origin) 
and murine N4 neuroblastoma cells. pCB201, pCDU3 and 

15 pCDU4 induce in MCF-7 cells a type of filopodium which 
is frequent in wild type N4 cells but rare to absent 
in wild type MCF-7 cells, suggesting the activation by 
these constructs of motile behaviour which is "normal" 
in N4 cells but of an unusual type in MCF-7 cells* 

20 This indicates the activation of a specific downstream 
process as opposed to a disruption of an existing 
process. It is well known that some cell types prefer 
to migrate with filopodia and other cell types with 
lamellipodia. 

25 3. Expression of pCB201, pCDU3 and pCDU4 gives 

qualitatively similar F-actin remodelling and 
increased filopodia and lamellipodia outgrowth. 
pCB201 and pCDU3 are however much more active in this 
process than pCDU4 . 

30 4. pCB201 is a much more potent activator of 

filopodia outgrowth than pCDU4 , which is to be 
expected considering the large evolutionary distance 
between between C. elegans and vertebrates. 

5. These experiments identify homology domain E 

35 (predicted nucleotide binding domain) of UNC-53 as the 
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"domain" that activates F-actin remodelling and 
f ilopodia/lamellipodia outgrowth. Progressive 
addition of the aminoterminal homology A, B, C, D lead to 
qualitative and quantitative modulation of the 
5 phenotype present in domain E. 

6. Homology domains C and D (pCDU3) "enhance the 
basic activity present in homology domain E 
(pCDU4/pCB201) . 

7. Homology domains B and C (pCDU2) 

10 qualitatively modify the phenotype of domain E, 
leading morphologically different lamellipodia 
formation than pCDU3 transfected cells. It is thought 
that lamellipodia and filopodia formation are mediated 
by different signal transduction pathways requiring 

15 two related but different Ras-like G-proteins RAC for 
lamellipodia formation and CDC42 for filopodia 
formation. 

8. pTB72 which includes homology domains 

A,B ,C ,D ,E plus an additional 700 amino acids not yet 
20 identified isolated in the human members of the family 
confers a more localised filopodia outgrowth and a 
different morphology. 

9. The expression levels of pTB72 (full length 
C. eleaans UNC-53) , pCDU3 , pCDU4 and pCB201 are 

25 extremely low. The observed effect is therefore 

unlikely to be due to dominant negative effects (such 
as stoichiometric depletion of other cellular 
components) or structural changes in the actin 
cytoskeleton mediated by UNC-53 or its fragments. 

3 0 The data point to a multi-domain organisation in 

UNC-53 whereby the aminoterminal domains exert 
positive (e.g. pCDU3) and negative (e.g. pCDU2) 
control on the activity of the domain E or are leading 
to novel activities or the localiation of the activity 

35 in the cell (pCDU2, pTB72) • Our observation that the 
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nucleotide binding domains (NTB) of distantly related 
members of the UNC-53 family induce similar 
phenotypes, suggests a general role for this domain of 
the UNC-53 family. 



CELLULAR ASSAYS TO IDENTIFY P HARMACOT.OflT CftI, 

MODULATORS OF UNC-53 AND compone nts of twb m^ -m 
PATHWAY 

Mammalian and human cells transfected with 
plasroid constructs containing unc-53 sequence of 
either — elegans or of human origin were observed to 
display obvious, specific and similar changes in 
comparison to mock or untransf ected parent cells. 
These changes relate to the functioning of the 
cytoskeleton, in particular the F-actin cytoskeleton, 
to cell locomotion and directionally cell motility and 
reflect UNC-53 gene family members as capable of 
playing an integrator function in cell motility. 

The cellular tools derived through transfection 
and derived functional assays with these cells not 
only enable characterisation of the motile phenotype 
typically observed after introduction of unc-53 genes, 
they also can be easily adapted to screen for 
pharmacological compounds that interfere with either 
(1) the expression of unc-53 gene family members, (2) 
the cellular functioning of unc-53 transgene(s) and of 
components in the unc-53 signal transduction pathway. 

Two classes of pharmacological modulators are 
envisaged. 

A first class are inhibitors of UNC-53s or the 
unc-53 pathway(s), which revert the described 
phenotypic changes induced by unc-53 transgenes or 
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aspects thereof. Such compounds are considered 
relevant leads to target diseases where unwanted 
directional motility of cells occurs such as 
metastatis, angiogenesis or inflammation. 
5 Secondly, pharmacological stimulators are 

envisaged, such as compounds which induce - in non- 
transfected cells - phenotypes that induce or mimick 
(aspects of) the described "unc-53' phenotype. Such 
compounds may do so by inducing or upregulating 

10 expression levels of a known unc-53 gene or by 

activating endogenous (yet unidentified) members of 
the unc-53 gene family. The target application here 
are wound and tissue repair, in particular diseases 
such as neuronal regeneration and plasticity. 

15 The nature of compounds envisaged can be small 

(organic) molecules, bio-molecules (such as peptides, 
sense or antisense (oligo-) nucleotides or chemical 
modifications thereof. Alternatively, compounds can 
be thought of as a series of plasmid nucleotide 

20 constructs containing gene sequences in a screen for 

novel unc-53-unrelated genes with a similar functional 
effect in the cell or genes related to the unc-53 gene 
family or novel members of the unc-53 gene family 
based on sequence similarity such as for example the 

25 genes in plasmids pTB72, pcDU3 , pcDU4 , pcDU2 , pcB201, 
or modifications thereof such as for example epitope 
tagged, deletion, complementation or mutagenised 
nucleotide constructs . 

The cellular assays envisaged in the claims have 

3 0 been exemplified for three cell lines: the human 

breast carcinoma cell line MCF-7 , the mouse neuronal 
cell line N4 and the mouse fibroblast cell line NIH- 
3T3. Pharmacological assays are focused on 
quantification of endpoints in a high throughout 

3 5 screening mode. Many of the computer aids for 
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(semi-) automation are well known to the field and 
currently applied in the applicants labs. Given the 
subtlety of the phenotypes observed, primary focus was 
given to morphological assays that assess the 
phenotypes or aspects thereof. 

The nucleotide binding domain of Hu-UNC-53/1 has 
transforming activity in NIH3T3 fibroblasts 

Biochemical and genetic analysis suggest that 
UNC-53 functions in GRB-2 mediated signal transduction 
pathways controlling cell motility. The occurence of 
an altered hu-UNC53/l mRNA pattern in cancer cell 
lines, moved us to investigate if whether hu-UNC53/l 
plays a role in the transformed state of those cells. 

Thereto, we tested the ability of the nucleotide 
binding domain of hu-UNC-53/1 and Ce-UNC-53 to 
transform NIH/3T3 cells. Construct pCB201 (hu-UNC- 
53), which induces ruffling behaviour and cell 
motility, were transfected into NIH3T3 cells. 
Positive controls included Myc and H-ras. Negative 
controls included empty vector adn Rac 1N17 and 
cdc42N17. 

The cells that survived G418 selection were 
assayed for loss of contact inhibition (their ability 
to grow as foci) . Positive controls included the 
combination of two well known oncogenes Myc and H-ras 
which were able to produce a high number of foci. The 
nucleotide binding domains of both Ce-UNC-53 and hu- 
UNC-53/1 are able to induce foci in this assay (Fig 24 
& Table 1) . 
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This suggests that the function of UNC-53 is not 
restricted to the activation of motility. UNC-53 may 
exert this additional function through the activation 
of as yet to be identified signal transduction 

15 pathways. Oncogenes frequently arise when a 

"controlling" domain and "activation 11 domain are 
separated though chromosomal rearrangements or 
integration of a part of a gene in the oncogenic 
virus. E.g. Erb Receptor tyrosine kinases, Ost a 

20 nucleotide exchange factor for Rac-l. 

Hu-UNC-53/l is localized to chromosome lq31.1 

Clone F226 (BACH-135 (014), Genome Systems, inc) 
25 was isolated from a human genomic BAC library using 
pCR2 31 as a probe and was confirmed by sequence 
analysis to be derived from the hu-UNC-53/1 locus. 
Purified DNA from clone F226 was labeled with 
digoxigenin dUTP by nick translation. Labeled probe 
3 0 was combined with sheared human DNA and hybridized to 
normal metaphase chromosomes derived from PHA 
stimulated peripheral blood lymphocytes in a solution 
containing 50% formamide, io% dextransulf ate and 2X 
SSC. Specific hybridization signals were detected by 
35 incubating the hybridized slides in f luoresceinated 

antidigoxigenin antibodies followed by counterstaining 
with DAPI. The initial experiment resulted in 
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specific labeling of the long arm of a group A 
chromosome* A second experiment was conducted in 
which an anonymous probe which was previously mapped 
to lp34 and confirmed by cohybridization with a 
chromosome 1 centromere specific probe, was 
cohybridized with F226. The experiment resulted in 
the specific labeling of the long and short arms of 
chromosome 1. Measures of 10 specifically hybridized 
chromosomes 1 demonstrated that F226 is located at a 
position which is 52% of the distance from the 
heterochromatic-euchromatic boundary to the telomere 
of chromosome arm lq, and that corresponds to band 
lq31. At total of 80 metaphase cells were analyzed 
with 72 exhibiting specific labeling (Fig. 25) . 

Gains of DNA sequences in 1Q31 were found in more 
than 10% of primary bladder tumors (Genes Chromosom 
Cancer 12: 213-219 (1991)). A putative tumor 
suppressor gene located near the locus F13B on 
chromosome arm Iq31-q3 2 appears to be involved in the 
pathogenesis of medulloblastoma (Int. J. Cancer 67: 
11-15 (1996)). Loss of heterozygosity in this region 
of chromosome I has been implicated in development of 
human hepatoblastoma. Partial trisomies of lq31 were 
found in Ewing's Sarcoma cell lines isolated from 
patients Cancer Genet Cytogenet 12: 1-19 (1984). 

HU-UNC-53/2 is localised to Chromosome llplS.l 

DNA from clone F3 29 from BAC for Hu-unc-53/2 was 
labeled with digoxigenin dUTP by nick translation and 
applied in the experimental settings used for FISH of 
Hu-unc53/l with F226. The initial experiment with 
F329 resulted in the specific labeling of the mid 
short arm of a group C chromosome which was believed 
to be chromosome 11 on the base of size, morphology 
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and banding pattern. A second experiment was 
conducted in which a biotin labeled probe specific for 
the centromere of chromosome 11 (D11Z1) was 
cohybridised with clone F329. This experiment 
resulted in the specific labeling of the centromere in 
red and the mid short arm in green of chromosome 11. 
Measurements of 10 specifically labeled chromosomes 11 
demonstrated that F329 is located at a position which 
is 65% of the distance from the centromere to the 
telomere of the chromosome lip, an area which 
corresponds to band llpis.l. A total of 80 metaphase 
cells were analysed with 72 exhibiting specific 
labeling. 

Chromosome llplS is a region showing loss of 
heterozygosity (LOH) in a variety of human 
malignancies, primarily breast cancer (Ali et al., 
Science 238, 185-188 (1987); Winqvist et al., Cancer 
Res. 53, 4486-4488 (1993)) but also Wilms' tumor 
(Dowdy et al., Science 254, 293-295 (1991); Cowell et 
al., Br. J. Cancer 67, 1259-1261 (1993)), ovarian and 
testicular malignancies (Lothe et al., Genes 
Chromosomes Cancer 7, 96-101 (1993); Weitzel et al. , 
Gynecol Oncol. 55, 245-252 (1994)) stomach cancer 
(Baffa et al., Cancer Res. 56, 268-272 (1996)), lung 
cancer (Ludwig et al., Int. J. Cancer 49, 661-665 
(1991); Fong et al., Genes Chromosomes Cancer (1994)), 
infantile tumors of adrenal and liver (Byrne et al., 
Genes Chromosomes Cancer 8, 104-111 (1993)). Since 
LOH is believed to indicate inactivation of a tumor 
suppressor gene at the location where LOH occurs, the 
frequent LOH found at llplS in multiple human cancers 
suggests the presence of either a cluster of tumor 
suppressor genes or a single tumor suppressor in this 
region (Seizinger et al., Cytogenet. Cell genet. 58, 
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10080-10096 (1991)). Chromosome transfer studies have 
shown that chromosome 11 can suppress tumorigenicity 
of both human breast cancer (Negrini et al., Cancer 
Res. 55, 3003-3007 (1995)) and Wilms 1 tumor cells 
5 (Dowdy et al*, Science 254, 293-295 (1991)) and a gene 
(named HTS1 or ST5) that may be responsible for 
suppressing tumorigenicity in HeLa cells has been 
mapped to llpl5 (Lichy et al., Cell Growth Diff. 3, 
541 — 548 (1992)). Abnormalities at llplS have also 

10 been identified in a variety of other cancers, 
including lung cancer (parental origin of llpl5 
deletion) (Kondo et al.. Oncogene 9. 3063-3065 
(1994)), bladder cancer (Presti et al., Cancer Res. 
51, 5405-5409 (1991)), myeloid leukemia 

15 (translocation) (Nakamura et al., Nat. Genet. 12, 154- 
158 (1996)), malignant astrocytomas and other 
primitive neuroectodermal tumors (deletions) (Fults et 
al., Genomics 14, 799-801 (1992)), rhabdomyosarcoma 
(Scrable et al., Nature 329, 645-647 (1987)) and 

20 hepatocellular carcinoma (Fujimori et al., Cancer Res. 
51, 89-93 (1991); Wang et al., Cell Genet. 48, 72-78 
(1988)). Recently a gene, TSG101, was cloned that is 
mutated in human breast cancer and deleted in 
uncultured primary human breast carcinomas (Li et al., 

25 Cell 88, 143-154 (1997)). 

DIAG NOSTIC ASSAY USING THE DNA SEQUENCE OF HUMAN 
VNC-S3S 

30 The differential expression of human unc-53 

transcripts in Northern blots of normal tissues versus 
transformed cell lines and the chromosomal locus of 
hu-unc-53/1 at lq31 being a locus linked to three 
diseases, suggests the potential implication of hu- 

35 unc-53 genes in oncogenesis. By using the complete 
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DNA sequence of hu-unc-53/1 or / 2 or fragments thereof 
in FISH, the potential involvement of these genes can 
be diagnosed in patients as exemplified in figure 26. 
Alike, the use of these hu-unc-53 sequences in 
5 diagnostic PCR assays can be used to determine 

overexpression of hu-unc-53s or fragments thereof. 

Assay for micro scopic phenotypic UNC-53 
transfected MCT-7 cells 

10 

Mock and unc-53 transfected MCF-7 cells were 
seeded at low density in culture plates and allowed to 
adhere to the vessel. Light microscopic inspection at 
different time points either on live cells or after 

15 chemical fixation with Karnovsky ' s fixative revealed 
that in pcB201, MCF-7 transfected cultures a rounded 
shaped cell body with at their boundaries many 
filopodia. In contrast, mock or untransf ected clones 
had a predominant "flat 1 phenotype - with little or no 

20 filopodia. Quantitative measurements confirmed the 
statistical significance of this shift in phenotype 
(table 2 below) . 

TABLE 2 

2 5 Quantification of phenotypic changes in unc-53 transfected MCF-7 cells ( # ) 



Transfection: clone no feet (") with feet (**) fraction with feet 

mock e 34 8 0.19 

37 0 0 

pcB201 2 17 92 0.84 

37 83 0.69 

16 27 62 0.70 

20 71 0.78 

13 85 0.87 



30 (*) Clones were passaged thrice, frozen and stored. 
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Thawed cells were trypsinised at confluency, 
monodispersed, seeded in flasks and allowed to attach 
to substrate overnight to 48 hours. Cultures were 
fixed with Karnovsky fixative and inspected using 
5 phase contrast microscopy. In parallel experiments, 
resistance to geniticin was confirmed, 

(**) values are expressed as cells per 
microscopic view* 

10 Assay for ruffling and mo tile behaviour using 

automated time lapse 

The dynamic changes in cells are well known in 
the field. Animations of e.g. actin ruffles in 
15 astrocytoma cells or od actin based cell motility in 
e.g. fibroblasts can be accessed 

(http : / /www . stc . emu . edu/ CLMIBhp/ Imggallpg/Moviespg/ 
actinruf f le . mov) or 

(http: //util .ucsf . edu/mitchi/Movies/migrat ion. html) on 
20 the world wide web. The dynamic changes as a result 

of transfection with unc-53 can best be appreciated in 
time lapse video sequences- At high magnification, 
the "filopodia* display arrays of microspikes with 
highly dynamic behaviour. A rough visual estimate 
25 suggests these phenomena to be at least 10-fold 

increased in pcB201 transfected cells relative to the 
mock-transf ected MCF-7 cells. Animations of these 
clones in NIH-Image can be requested from author or 
applicant. 

30 Time lapse video imaging probably is the most 

informative way to appreciate the unc-53-induced 
phenotype in MCF-7 and is amenable to high throughput 
screening in a pharmacological context. Time lapses 
compressing 5 minutes real time supply sufficient 

35 information to quantitate the intensity of the motile 
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behaviour of pcB201 transfected MCF-7 cells in e.g. 12 
well plates. In addition, algorithms have been 
described in the field which can automatically compute 
the "motile area 1 of cells by comparing cells in two 
images appropriately spaced in time (van laerebeke 
etal., 1992, cytometry, 13, 1-8). 

Assay for visualising unc-53-induced F-actin 
recruitment in MCF- 7 cells 

Cultures were chemically fixed, detergent 
extracted and fluorescent ly stained for F-actin 
(f ilamentous-actin) using f luorescently labeled 
phalloidin (Wieland et al., 1985, Int. J. Peptide & 
protein Res, 21, 3-10) which display in a more 
specific way the dramatic phenotypic changes to 
transfection with unc-53 transgenes. By using image 
capturing and analysis of the F-actin patterns, image 
analysis algorithms well known in the field can assess 
in an automated way, the f-actin filament positions, 
texture and distribution relative to the nuclear 
position or gravity point of the cells. Such 
algorithms are capable of discriminating phenotypic 
changes and thus also effects of pharmacological 
inhibitors of transgene-induced phenotypes as well as 
compound induced unc-53 like phenotypes in mock or 
untransf ected cells . 

Phagokinesis assay for unc-53-induced 
directionality and quant ity of motility 

The methods are described in the experimental 
section. Two cell populations with different motile 
behaviour in phagokinesis assays were observed. In 
table 3 below the fraction of mock and UNC-53 
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transfected MCF-7 cells that produced linear tracks in 
the phagokinesis assay are shown. In the mock 
transfected MCF-7 cells, 61% of the cells produce a 
round track (long and short axis less than 2-fold 
different) and 39% cells produced "linear 1 tracks 
(long and short axis more than 2-fold different) . 
pcB201 transfected MCF-7 cells produced an increase of 
the fraction of cells displaying "linear 1 tracks to 
50%. An increase in the fraction linear tracks was 
made for MCF-7 cells transfected with full sequence 
Ce-unc-53 . 

In addition, a significant increase of 50% in the 
median area of tracks of a culture vessel was observed 
in the pcB2 01 transfected MCF-7 cells versus mock 
transfected MCF-7 cells (Table 2) . These observations 
suggest that pcB2 01 as well as pTB72 transfection into 
MCF-7 cells is capable of increasing in situ 
locomotion in Ce-UNC-53 MCF-7 , e.g. by increasing 
spreading, ruffling, or other forms of non-directional 
motility in the "round 1 population. In addition the 
Ce-UNC-53 transgene in MCF-7 cells drives a fraction 
of the MCF-7 cells from non-directional motility 
(round tracks) into directional migration (linear 
tracks) . Clone 2 thus provides a tool to analyse 
inhibitory or stimulating effects of pharmacological 
compounds on directionality or quantity of cell 
motility in relation to UNC-53. 
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Table 3. Analysis of motility in phagokinesis assays 

Track morphology: fraction linear tracks 



plasmid clone 

Mock 24 

10 

pCB201 Clone 2 

15 

Track Size 

Clone average+SD 

20 

Z4 1626±188 

Clone 2 2326±283 



round linear 1/r 

18 13 0.42 

17 11 0.39 

22 12 0.35 

16 9 0.36 

13 13 0.5 

7 8 0.53 

9 9 0.5 



min max (N) 

1444 2011 (8) 

1989 2816 (8) 



25 Assays for the localisation of un c-53 in the cell 

to microtubules or microtubu le t+) plus ends 

UNC-53 s have been shown to reside on microtubules 
and preferentially on the microtubule (+)-ends of 

30 cells. This localisation represents an important 
feature of the UNC-53 family of proteins, which is 
rarely observed in other proteins. Absence of 
microtubule (+)-end binding in the protein APC 
following mutation has been implied in the role of APC 

35 in colon cancer (Smith et al., 1994, Cancer Res., 54, 
3672) . In analogy, it can be postulated that the 
proper functioning of UNC-53 also may depend on its 
specific localisation in the cell. 

The methods used in the examples which prove the 

40 co-localisation with microtubules form a base for a 
series of assays for compounds which specifically 
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affect microtubule (+)-end binding of UNC-53s. To the 
skilled eye, the typical localisation of an UNC-53 
protein on microtubules can be readily recognised and 
thus is sufficient for the interpretation of whether 
the treatment with a compound has affected the 
localisation of this UNC-53 (or a fragment thereof) . 
Moreover, by combining the described methods; (co- 
localisation) - well known to one skilled in the field 
and exemplified by the methods in the "experimental 
procedures" section - one can unequivocally confirm a 
compounds ability of abrogating (or promoting) 
microtubule and microtubule (+)~end binding- 
Such an assay comprises contacting a cell culture 
of a cell line expressing an UNC-53 with a compound in 
the culture conditions proper for the said cell line, 
followed by an incubation and finally observation of 
the UNC-53 (or fragment) in situ by e.g. fluorescence 
microscopy (for GFP-chimeras) or by fixing the cell 
culture and performing an immunocytochemical staining 
0 for the UNC-53 (or fragment) ♦ For the co- 
localisation, methods such as immunocytochemistry for 
the microtubules of a cell or cell line combined with 
either immunocytochemistry for Ce-UNC-53 or Hu-UNC-53s 
or fluorescent detection GFP-UNC-53 chimeras are 
5 performed consecutively. 

g.gleqans-UNC-53 preferentially binds mic rotubule 
Plus-ends or GTP-tubulin 

0 Biochemical characterisation of UNC-53 has shown 

that UNC-53 binds the SH3 binding domains of SEM- 
5/GRB-2 and binds F-actin in vitro. GRB2 has been 
localised to the cortex of the cell and reported to be 
involved in the control of cell motility. To 

5 determine the in vivo subcellular localisation of Ce- 
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UNC-53, we transiently transfected COS, HepG2 and MCF7 
cells with pTB72, an expression construct containing 
the full length Ce-unc-53 cDNA. This construct was 
previously shown to activate cell motility in N4 
5 neuroblastoma and MCF7 cells. This construct gives 

high transient expression in COS cells, high to medium 
levels of expression in MCF7 cells and medium to low 
levels of expression in HepG2 cells. To visualise 
UNC-53, tubulin and F-actin, transfected cells were 
10 stained with various combinations of the anti-Ce-UNC- 
53 mab 16-48-2, rabbit anti-UNC-53 polyclonal, anti- 
tubulin mab YL1/2 and f luorescently labelled 
phalloidin. 

At high levels of expression UNC-53 co-localises 

15 with the entire microtubule cytoskeleton, but at lower 
expression levels UNC-53 signal is restricted to the 
terminal regions of the microtubules at the plus ends* 
Very low levels of the expression yield a dot-like 
pattern in the vicinity of the cortex of the cell. 

20 To map the MTB plus end domain of Ce-UNC53, we 

made two constructs pcDU2(figure 17) and pcDU3 (figure 
15) in which the aminotermus of Ce-UNC-53 is deleted. 
Proteins corresponding to these constructs are thought 
to be made in vivo from different unc-53 promoters. 

25 Transient transf ections followed by immunolocalisation 
showed these proteins to be cytoplasmic. In stable 
transf ections in N4 neuroblastoma cells and MCF7 cells 
they were shown to be no longer toxic to cells but 
cause highly increased activation of filopodia 

30 formation. We thus uncoupled (1) toxicity of Ce-UNC- 
53 from activation of motility and (2) microtubule 
binding from the activation of motility. 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 



PCT/EP97/06956 



- 66 - 



Analysis Qf the microtubule asso ciation of tft g 
C.elegans and Human l UNgg3 

To isolate the microtubule association domain of 
5 the C.elegans UNC53, N-terminal GFP fusions were made. 
C-terminal deletions on the fusion product revealed 
that the microtubule association was localised in the 
N-terminal half of the protein, A GFP fusion was also 
constructed with the Humanl-UNC-53 , to analyse the 
10 microtubule association properties of this protein. 

The association with microtubules was confirmed. A 
mouse anti fi&m w?*c dcoH oK^it.t +- v, ^ 

— — ' — w**w«* i-w^ £-/JL COCJIUC ui 

native Unc-53 on microtubule plus ends of melanoma 
line G361. The epitope recognition of the antibody 
15 was confirmed by immunohistology experiments with 
mammalian cells, transiently expressed with pLM4, 
expression the GFP-hul-UNC53 fusion protein. 



20 



RQSUltS 



1. When transiently transfecting pTB72 in 
several cell lines C.elegans UNC-53 associates with 
microtubules and preferentially the plus-ends of the 
tubuline fibres. Transfection of plasmids pCDU3 and 

25 pCDU2 in N4 and MCF7 cell lines did not result in the 
observation of microtubule co-localisation. pCDU4 
resulted in no staining using mab 16-48 antibody (LMBP 
Accession No. 1383CB) concluding that the epitope for 
this antibody is localised outside the fragment 

3 0 expressed by pCDU4 . 

It is possible that the microtubule associated 
domain is situated in the N-terminus of the protein. 
For this reason, we constructed an N-terminal GFP 
fusion with the full length C.elegans UNC-53 sequence, 

35 and various C-terminal deletion derivatives. These 
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fragments encode the N-terminal part of UNC-53 from 
139 to 760 aa. 

Furthermore, to analyse if the cloned fragment of 
hul-unc53 also could be associated with microtubules, 
5 a plasmid encoding a GFP fusion with the hul-Unc53 

protein was constructed, and introduced into mammalian 
cells. A derivative of this construct was also 
constructed . 

10 2. 

a) Transient expression of C.elegans Unc-53 GFP 
fusion in N4 neuroblastoma lines 

N4 cells where transiently transfected with 

15 pEGFP72, encoding a fusion protein of GFP and full 
length C.elegans unc-53 sequence. On an inverted 
microscope, the fluorescence of the GFP molecule could 
be followed in living cells. Cells which expressed 
low to medium levels of the fusion molecule showed a 

20 normal morphology after 18h to 30h. In these cells 

the co-localisation of the GFP fusion protein with the 
microtubules could clearly be demonstrated (figure 
38a) . In cells which demonstrated a low but still 
distinct GFP fluorescence, specific microtubule plus- 

25 end association could be observed (figure 38b). Cells 
expressing high levels of the GFP fusion protein tend 
to round up, in such a way that the microfilaments are 
difficult to visualise. After 48h, almost no GFP 
expressing cells can be found. It has previously been 

30 observed in transient expression of Unc-53, using 
plasmid pTB72, that the protein is toxic for the 
cells. The transient transfection experiments with 
the pEGFP72 plasmid gives the same observation, 
indicating that at least two features of the Unc53 

35 protein are conserved in the GFP fusion protein, being 
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the microtubule association and the toxicity of the 
protein. 

The transfected cells were fixed with 
paraformaldehyde, and the tubuline was stained using 
antibody YL1/2 and antimouse-CY3 (Jackson Labs) . 
Although a significant loss of GFP fluorescence was 
observed, one could clearly demonstrate that the 
filaments observed with the GFP fluorescence co- 
localise with the microtubules staining (figure 39) . 

Putative Aag^y 

Mammalian cells, in this case N4 , were 
transfected with a lipofecting agent (lipof ectAMINE) 
while in suspension, not being attached to a surface. 
After transfecting those cells with pEGFP72, the 
transfected cell suspension could be diluted in 24- 
and/or 96-well plates, enabling them to attach ot the 
surface. Each well may contain a different compound 
of the collection to screen. After 24h, plates could 
be automatically screened for fluorescence levels. 
Wells containing a compound that abolish the toxicity 
of the GFP-C.elegans UNC-53 fusion protein will give 
high levels of fluorescence. Compounds having no 
effect on the fusion product will give no or only low 
levels of fluorescence. 



b) Transient expression of the truncated GFP- 
30 C.elegans UNC-53 fusion proteins. 

To assay if the microtubuline association 
did occur in the N-terminal part of the C.elegans Unc- 
53 protein, various C-terminal deletions were 
constructed. 

35 Transfection of pEGFPsma and pEGFPecl coding 
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for 760 AA and 670 of the N-terminal part of C.elegans 
UNC-53 in fusion with GFP, resulted in microtubuline 
association, as could be visualised in living cells. 
The association with the microtubules is less abundant 
5 than observed when expressing the full length 

C.elegans UNC-53 protein, but fibres could clearly be 
observed ( figures 4 0a and 41a) . More background 
fluorescence is seen. This could be due to a lesser 
association to the microtubules or to a instability of 

10 the fusion protein. The association with microtubules 
could not be observed after fixing the cells with 
paraformaldehyde nor with methanol fixation, giving an 
extra indication for the weak association with the 
microtubule network of these proteins or potential 

15 instability of the fusion protein. At low expression 
levels the association of the GFP fusion protein with 
the centrosomes could clearly be detected (Figures 40b 
and 41b) . Centrosomes are the location in the cell 
with the highest microtubule concentration. 

20 

No plus-end associations could be observed 
with the deletion constructs, even when cells where 
expressing low levels of the GFP fusion proteins. In 
the case of very low expressions, the centrosomes 
25 could clearly be detected. 

When transfecting N4 cells with pEGFPsac or 
pEFPXba, coding for 139 aa and 256 aa of the N- 
terminal part of C.elegans UNC-53 in fusion with GFP, 
30 no microtubule association could be observed. This 
indicates that at least 67 0 aa of the N-terminus of 
the C.elegans UNC-53 is needed to have microtubule 
association (figures 42a and 42b) . 

35 c) Transient expression of the GFP-hu-UNC-53/1 
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fusion proteins and a deletion derivative. 

Plasmid pLM4 was transiently transfected 
into N4 neuroblastoma cells, and GFP fluorescence was 
observed in living cells. GFP fluorescence of the 
available sequence of hul-UNC-53 in fusion with GFP 
was localised at the microtubule level. Moreover, at 
lower expression levels, both the centrosomes, and 
specific plus-end association could be observed. As 
has been observed with the C.elegans UNC-53 
derivatives in fusion with GFP, expressed by the 

plasmids pEGFPsma and degfpaoI ^ 0 ~. j _^ J 

seems to be less tight as was observed by the full 
length C.elegans UNC-53 fragment in fusion with GFP. 
The observed instability of the fusion protein can be 
due to a lesser association to microtubules, or to a 
degradation of the fusion protein (figure 43) . 



20 d > Immunofluorescence on melanoma line G361, 

and on neuroblastoma line N4 transiently transfected 
with pLM4. 

Introduction 



10 



15 



25 



30 



35 



Northern experiments show that the melanoma 
cancer line G361 expressed abundantly both the Humanl 
and Human2 homologue of C.elegans UNC-53. To test if 
the proteins could be localised in this cell line, a 
collection of mouse sera was tested on this cell line. 
To verify if the observation was due to a hu-UNC-53 
recognition, and not to an artifact, a positive sera 
was applied to N4 cells transiently transfected with 
PLM4, expressing the GFP-hul-Unc fusion. 
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result 



a serum, designated 28.1 from a mouse previously 
injected with peptide (DNRTLPKKGLYRY) a conserved 
5 sequence of the UNC-53 family was used for a 

immunolocalisation experiment on G3 61 cells fixed with 
paraformaldehyde. Antimouse-cy3 was applied as second 
antibody. Association with microtubule plus-end could 
clearly be observed. Moreover, in cells showing 
10 directional movement, observed as growth cones 

extensions, abundant staining can be seen in the tip 
of the growth cone (figure 45). To test whether the 
recognition of the microtubule associated protein was 
identical to the Hul-UNC-53 protein, N4 cells were 
transiently transfected with plasmid pLM4 and 
consequently fixed with paraformaldehyde and stained 
with serum 28.1. Only cells that were transfected 
showed staining with 28.1, indicating that the 
antibody of 28.1 recognised the Hul-UNC-53-GFP fusion 
protein (figure 46) . This confirms that the staining 
of the microtubule plus-ends in the growth cones of 
G361 by serum 28.1 is due to a recognition of at least 
the Humanl and/or the Human2 homologue. It is 
concluded that the overexpression of the human 
homologue of C.el.egans UNC-53 in the melanoma 
cancerline G361 is located on the microtubule plus- 
ends. 



35 



Conclusions 

a) - GFP-C.elegans UNC-53 fusion protein 
expressed by pEGFP72 shows Unc53 activity 

b) - GFP-C.elegans UNC-53 fusion protein 
expressed by pEGFP72 shows microtubule association 

c) - GFP-C.elegans UNC-53 fusion protein 
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expressed by pEGFP72 shows microtubule plus-end 
association 

c) - GFP-C.elegans UNC-53- (deletion variant) 
fusion proteins expressed by plasmids pEGFPsma and 
pEGFPecl show microtubule association, 

d) - GFP-C.elegans-UNC-53- (deletion variant) 
fusion proteins expressed by plasmids pEGFPsma and 
pEGFPecl no not show microtubule plus-end association 

e) - GFP-C.elegans UNC-53- (deletion variant) 
fusion proteins expressed by plasmids pEGFPxba and 
pEGFPsac no not show microtubule associations. 

f) - GFP-hul-UNC-53 fusion protein expressed by 
plasmid pLM4 shows microtubule association. 

g) - GFP-hul-UNC-53 fusion protein expressed by 
plasmid pLM4 shows microtubule plus end association. 

i) - serum 28.1 recognises the Hul-UNC-53-GFP 
fusion protein as expressed by plasmid pLM4 in 
transiently transfected Neuroblastoma cells N4 . 

j) - the expressed human homologue of C.elegans.- 
UNC-53 in melanoma line (being at least hul-Unc-53) is 
associated with the microtubule plus-ends. 

EXPERIMENTAL PROCEDURES 
Materials 

The oligonucleotides used in the PCR-RACE 
experiments were synthesised by Eurogentee (Belgium) . 
Radioactive compounds were obtained from Amersham. 
The pCDNA3.1 eukaryotic expression vectors, human 
1GT10 cDNA libraries, marathon-RACE cDNAS, human, 
Northern blots and the T7-tag monoclonal antibody were 
purchased from Invitrogen. N4, MCF7 and NIH 3T3 cells 
were retrieved from the Janssen Research cell bank. 
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PCR-RACE conditions 

1. A quick screen human cDNA library panel was used 
to amplify EST clone gb..R41071. The primers used 
were ESTfw 5 • -AATGGCTTCCTGGTTACCTGAG-3 1 and ESTrv 5 1 - 
CAAGTCAGCACCCCGAAGCAGCTCT-3 ' . Human genomic DNA was 
used also as template (lOOng/reaction) . The 
amplification conditions were as follows: 1 min at 
94°C, 30 sec at 55°C, 30 sec at 72°c, then 35 more 
times and a final extension of 20 min at 72°C. This 
PCT fragment was cloned in vector pCR2.1. The 
resulting plasmid was designed pCR23l. 

A human heart clone was also produced by RACE-PCR 
from a human heart Marathon cDNA using the following 
conditions; 1 min at 94 C, 30 sec at 70°C, 3 min 30 sec 
at 72 C, then 35 more times and a final extension of 
2 0 min at 72 c KlenTaq DNA Polymerase was purchased 
from Invitrogen. 

For the mouse homologue, total RNA was obtained 
from N4 murine cells as described, A first strand 
cDNA was synthesized from 2 /uqr of RNA using Ready To- 
Go cDNA kit (Pharmacia) The primers used were M-ESTfw 
5 1 CCTCTGTGGGCACCGAGGTCACC — 3 ' . The amplification 
conditions were as follows: 1 min at 94°C, 30 sec at 
58°C, 30 sec at 72 : C, then 35 more times and a final 
extension of 20 min at 72 : 'C. All the amplifications 
product were subcloned in pCRII-l and several 
independent clones were analyzed by sequence. 

2, Screening of Human Heart/Colorectal Adenocarcinom a 
cDNA library 

A human heart cDNA library and a human colorectal 
adenocarcinoma cDNA library were screened using 
pCR231bp as probe by the standard plaque hybridization 
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method- The screening produced several positive 
clones in each library called respectively AHH3, AHH4 , 
AHH15, ACAD 14 and ACAD27. The positive phages were 
purified by two additional rounds of plaque screening 
5 and were then amplified. 

3. 5' extension using PCR 

Three primers with homology to the 5 1 end of 

10 clone AHH3b were made: 

HU53rvl ( 5 1 -cct-ggg-act-gaa-gct-ggt-acc-tga-gcc-3 1 ) , 
HU53rv2 ( 5 • -ttg-gga-aga-gtg-ttc-cga-tcc-cgc-tg-3 * ) and 
HU53rv3 ( 5 ' gtt-gcc-cag-ctc-tgg-ggc-ttc-cac-tcc-3 1 ) and 
used together with AgtlOrv primer (5 1 -gag-gtg-gct-tat- 

15 gag-tat-ttc-ttc-cag-ggt-a-3 ' ) in three nested PCR 

reactions on a cDNA amplified library from Human Heart 
(Clontech) . The reaction mixes contained 25pmol of 
each primer, 1 mM of each dNTP, 1 julKlenTaq Polymerase 
Mix (50x) and 0*1 ng DNA. The cycling parameters for 

20 the first PCR were: 3 min at 94°C, 35 cycles of 1 min 
at 94°C, l min at 51°C and 3 min at 72°C and a final 
extension of 10 min at 72°C, using HU53rvl and AgtlOrv 
as primers . 0.4 of this primary PCR product was 
amplified using HU53rv2 and AgtlOrv as nested primers 

25 with the following parameters: 3 min at 94°C, 38 

cycles of 1 min at 94°C, 1 min at 52°C and 3 min 30 
sec at 72°C and a final extension of 10 min at 72°C. 
The second nested PCR reaction was performed on 0.4 
of a 1/50 diluted purified 2.4 kb fragment using 

30 HU53rv3 and AgtlOrv as primers: 3 min at 94°C, 35 

cycles of 1 min at 94°C / 1 min at 56°C and 3 min 30 
sec at 72°C and a final extension of 10 min at 72°C. 
A 774 kb amplification product was subcloned in 
pCR2.1, resulting in plasmid pCB210-14. The clone 

35 fragment was analyzed by sequencing. This fragment 
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extends 699 bp in 5 1 direction (see fig 9). 

4. 5* extension using PCR 

5 Primer HU53rv4 (5 1 -ccc-tgc-ttg-gtg-ctg-agg-aga- 

ctg-g-3 1 ) was designed on the 5' end of clone pCB210- 
14 and was used together with AgtlOrv to amplify a 
fragment of the Human Heart cDNA library with the 
following parameters: 3 min at 94°C, 35 cycles of 1 

10 min at 94°C, 1 min at 60°C and 3 min 30 sec at 72°C 
and a final extension of 10 min at 72°C. A 887 bp 
fragment was subcloned in pCR2.1, resulting in plasmid 
pCB212. The clone fragment was analyzed by 
sequencing. This fragment extends a further 7 67 bp in 

15 5 1 direction (see fig 9). 

5. Human Heart Library screening using the 0.8 kb 
insert of pCB212 as probe 

20 The EcoRI digested and purified clone pCB212 was 

used as probe to screen the Human Heart cDNA library 
(Clontech) using standard plaque hybridization method. 
The positive phages were purified by two additional 
rounds of plaque screening. The insert of the XDNA 

25 (produced using Qiagen Lambda Kit) was analyzed by 
sequencing. This pHH14-3 resulted in a 2663 bp 
fragment overlapping pCB212, pCB2 10-14 and the 3' end 
(434 bp) of AHH3b and in a 761 bp 5' extension (see 
fig 9) . 

30 

3' and 5' exte nsion of HU-Unc53/2 from EST46037 

WashU-Merck EST 46037 

Transformed cells carrying the EST 46037 sequence 
3 5 were ordered from Research Genetics. Plasmid DNA was 



BNSDOCID: <WO 982481 0A2_I_> 




WO 98/24810 PCT/EP97/06956 

- 76 - 



isolated using standard protocols (Qiagen plasxnid DNA 
isolation kit) , the sequence of the insert was 
determined. 

5 3' extension of EST 4 6037 bv RACE 

Marathon-Ready cDNAs (Clontech) are premade 
"libraries" of adaptor-ligated double-stranded cDNA 
ready for use as templates in RACE experiments. 

10 Five ml Marathon-Ready cDNA was used as template in a 
regular 50ml RACE. The RACE mixture contained lx 
KlenTaq PCR buffer. 0.2 mM of each dNTP lx advantage 
KlenTaq polymerase mix (Clontech), 0.15 mM API adaptor 
primer and 0.15 mM RACE gene specific primer. The 

15 amplification conditions were as follows : 

94°C for 1 min, 5 cycles of 94°C for 30 s and 12\\C for 
4 min, 5 cycles of 94°C for 30s and 70°C for 4 min, 25 
cycles of 94°C for 30 s and 68°C for 4 min. 

20 One-hundred-fold diluted RACE product was used as a 
template in a nested PCR with AP2 adaptor and gene 
specific nested PCR primers. Specific nested PCR 
fragments were cloned into pCRr2 . 1 (TA cloning kit, 
Invitrogen) and the sequences of the inserts were 

2 5 determined. 

gene specific primer (EST46037-F1) 
5 1 AGTGAGAACAATGCTGTGGACATGC nested gene specific 
primer (ES4 603 7-F2) 5 1 CTGCTCAACTGCAAGTACCACAAATGC 
Marathon cDNA library : human placenta 

30 

WashV-MergK EST 923793 

Transformed cells carrying the EST 923793 
sequence were ordered from Research Genetics. Plasmid 
35 DNA was isolated using standard protocols (Qiagen 
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plasmid DNA isolation kit) , the sequence of the insert 
was determined. 

RACE fragments 1.4 and 3.7, 5* ex tension of 
EST46037 

Method as described previously. Gene specific 
primer (EST4 6037-R1) 5 ' ACTGCCTTGAGACTCTGACTTCAGC 
nested gene specific primer (ES46037-R2) 
5 • TGGGCAGAACTGAGAGCTTCTAAGC Marathon cDNA library : 
human placenta 

RACE fragments B2.1. D2.1, H2.1; 5' extension 

Method as described previously : gene specific 
primer (97010709) 5 1 ATTCTTTTGCATCTTCTTGCGTGCG 
nested gene specific primer (97010708) 

5 1 ACCTGAGTCCTTTCTTAGGCAAAGTGTTCC Marathon cDNA library 
: human placenta (fragment B2.1) 

human HeLa S3 (fragment D2.1) human colorectal 
adenocarcinoma SW480 (fragment H2.1) 

PCR fragments E2 . 3 . C2 . 3 

EST 485068 is similar to but not identical with 
the 5' end of HU-Unc53/l. A primer pair consisting of 
one 3' EST 485068 primer and one 5' HU-Unc53/2 primer 
were used to PCR amplify those fragments. lgtlO human 
placenta Quick screen library (fragment C2.3) or 
Marathon cDNA from human HeLa S3 (fragment E2.3) were 
used as templates in a PCR. A 50 ml reaction mix 
contained lxPCR II buffer (Perkin-Elmer) , 1.5 mM 
MgC12, 0.2 mM of each dNTP, 0.15 mM forward and 
reverse primer, 2.5 U AmpliTaq Gold (Perkin-Elmer) 
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and 1 ml template. The cycling parameters were 5 
minutes at 95°C, 35 cycles of 45 seconds 
at 94°C, 45 seconds at 65°C and 2 minutes at 72°C. 
The PCR products were sliced out from an agarose gel 
and purified using a gel extraction kit (Qiagen) , one 
ml hereof was used in a second round PCR using the 
same conditions as above. The PCR products were 
purified (Qiagen PCR purification kit) and direct 
sequenced. 

primers : 

(97010709) 5 1 ATTCTTTTGCATCTTCTTGCGTGCG 
(97012802) 5 1 CGCTCCCCATCAGATGCAGGCCGG 

PCR fragment El ,3-3 

EST 01222 is homologous but not identical with 
the 5* end of HU-Unc53/l. A primer pair consisting of 
one 3 1 EST 01222 primer and one 5 1 HU-Unc53/2 primer 
were used to PCR amplify this fragments. 
Marathon cDNA from human HeLa S3 was used as template 
in a PCR. A 50 ml reaction mix contained IxPCR II 
buffer (Perkin-Elmer) , 1.5 mM MgCl2, 0.02 mM of each 
dNTP , 0.15 mM forward and reverse primer, 
2.5 U AmpliTaq Gold (Perkin-Elmer) and 1 ml template. 
The cycling parameters were 5 minutes at 95 C, 35 
cycles of 45 seconds at 94°C, 45 seconds 
at 65 °C and 2 minutes at 7 2°C. The PCR products were 
sliced out from an agarose gel and purified using a 
gel extraction kit (Qiagen) , one ml hereof was used in 
a second round PCR using the same conditions as above. 
The PCR products were analysed on an agarose gel, the 
fragment of interest was sliced out, purified (Qiagen 
PCR purification kit) and cloned into 
pCRr2.1. The sequence of the insert was determined. 
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RACE fracrments as. 2-2. r? t i-4. np^ - f . c. 
extenai ?n 



Method as described previously, 
gene specific primer (97041701) 



5 1 TATGCTACGGCCACTCATCTCCGTGG 

nested gene specific primer (97041702) 



20 



25 



30 



5 1 TGTAACCTGAGTTCCCCTTAAACTGG 
Marathon cDNA library ; 
human placenta (fragment A2.1-2) 
human HeLa S3 (fragment B2.1-4) 

human colorectal adenocarcinoma SW480 (fragment 
D2.1-5) 

Translation-initiation splice variants, fragments 
D4.1-1, J4.1.4, G4.1.1, P4.1.2 

Four different translation initiation slice 
variants were detected by 5 'RACE. 

Method as described previously, 
gene specific primer (97080803) 

5 » TCGGTTGTTAGCAGTAGTTGACCCTCC 

nested gene specific primer (97080804) 

5 ' ACCTGAAAGTCTGGACTGCATTTCAGC 
Marathon cDNA library : human colorectal 
adenocarcinoma SW480 (fragment D4.1-1) gene specific 
primer (97080801) 

5 ' ACAACCTGGATAATCTGGGCCAGGAGG 
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nested gene specific primer (97080802) 

5 •TCTTGCTGGAGATCCTTGATGAGACGC 
Marathon cDNA library : 

human melanoma G361 (fragment J4.1.4) 
human HeLa S3 (fragment G4.1.1) 
human placenta (fragment F4.1.2) 

DNA sequencing 

PCR amplification products and cDNA clones were 
subcloned either into pBluescript vectors (Stratagene) 
or in PCR-IIa vector (Invitrogen) and sequenced either 
manually by the dideoxynucleotide chain termination 
method with modified T7 DNA polymerase (Sequenase, 
United States Biochemical) or automatically with an 
Applied Biosystems 373 DNA sequencer using the 
fluorescent terminator kit (Perkin Elmer) . 

RNA blots 

A Human multiple tissue Northern (MTN-1, 
Clontech) containing in each lane 2 mg of poly A + RNA 
from eight different human tissues (heart, brain, 
placenta, lung, liver, skeletal muscle, kidney, and 
pancreas) and a MTN-II human multiple tissue Northern, 
containing in each lane 2 mg of poly A + RNA from 
spleen, thymus, prostate, testis, ovary, small 
intestine, colon and peripheral leukocyte, were 
hybrydized according to the manufacturer's 
instructions and washed out in 0*lxSSC:0.2% SDS at 
55'C* Also from Clontech, a poly A + RNA blot from 
human cancer cell lines (melanoma G361, lung carcinoma 
A549, colorectal adenocarcinoma SW480, Burkitt's 
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lymphoma Raji Leukemia Molt 4, lymphoblastic leukemia 
K562, HeLa S3 and promyelocytic leukemia HL60) was 
tested. 



Construction of r i„« m n P 



Plasmid pCDU2 (Figure 17) was constructed by 
cloning the 2.8 kb Apal-Narl fragment from pTB72, the 
latter restriction site made blunt with klenow enzyme, 
into pcDNA3, digested with EcoRV and Apal . pCDU2 
encodes for the homology blocks A, B, C, D and E. 
Plasmid pCDU3 (Figure 15) was constructed by cloning 
the 1.9 kb Apal-Ndel fragment from pTB72, the latter 
restriction site made blunt with Klenow enzyme, into 
pcDNA3, digested with EcoRV and Apal, pCDU3 encodes 
for the homology blocks C, D and E. Plasmid pCDU4 
(Figure 16) was constructed by cloning the 1.4 kb 
Apal-Styl fragment from pTB72, the latter restriction 
site made blunt with Klenow, into pcDNA3 digested with 
EcoRV and Apal. pCDU4 encodes for the homology block 
E. 



Expression of a domain of the human UNC53 in 
eukaryotic cells 



1. pCB20l: Equivalent construct of human 1 
homologue to expression construct pCDU4 of C. elegans 
unc-53 gene cloned in a eukaryotic His-tag, Xpress Ab 
tag expression vector. 



A suitable Bam HI site was engineered on pHH15 
open reading frame by amplification with hhl5fw primer 
5 ' AGAGCGGATCCATATGCCTCCTTGCCGTCAAGGTG-3 ' and M13rv 
primer (5 • -cag-gaa-aca-gct-atg-ac-3 • ) . The amplified 
fragment was then moved to pCDNA3 . 1 . His-A-Vector 
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digested with BamHI and EcoRI. This new plasmid 
called pCB2 01 (Figure 13) produces a cDNA which codes 
for a fusion protein consisting of a 49 amino acid 
aminoterminal fragment containing an His-tag and also 
5 a T7 epitope tag followed by amino acids 1255 to 1627 
of the sequence of the human homologue. pCB201 was 
also checked by sequence and the n was used in stable 
transfection experiments carried out in N4, MCF7 and 
NIH3T3 cells. 

10 

2* pLM5: Equivalent construct of human 1 
homologue to expression construct pCDU3 cloned in an 
eukaryotic His-tag, Xpress Ab tag expression vector, 

15 The phage HH3b was linearized using Xhol. A 

BamHI and Xbaal restriction site were created on the 
pHH3b open reading frame using U3-Bfw (5 • -cca-cac-tag- 
ggg-atc-cat-gca-aat-gag-g-3 • ) and U-rv (S'-caa-aag- 
tct-cta-gag-gag-gcc-agt-3 1 ) as primers* This 

20 amplified fragment was then moved to pBluescript KS, 
digested with BamHI and Xbal. Sequencing of this 
plasmid, named pCB3 00, showed an amino acid change 
from a serine to an asparagine due to a change from 
guanine to adenine on the position 4237 of the DNA 

25 sequence. This fault was repaired by cloning a 1418 
bp fragment of pLMl (see below) (using Narl and Xbal 
as enzymes) into pCB300 digested with the same 
enzymes. The phage HH3b fragment of this plasmid, 
named pLM6 (fig 53) , was then removed using BamHI and 

30 Xbal, to pcDNA3 . 1/HisA digested with the same enzymes. 

This new plasmid, named pLM5 (fig 52), produces a cDNA 
which codes for a fusion protein consisting of a 49 
amino acids aminoterminal fragment harboring a His-tag 
and a T7 epitope tag, followed by aminoacid 1069 to 

35 1627 of the transcript of HU-Unc53/l. Plasmid pLM5 was 
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checked by sequencing and used on transient and stable 
transfection experiments carried out in N4 cells. The 
plasmid pLMi was created using a Pvuli and partial 
BaraHI digested fragment of pHHl4-3 and a BamHi and 
Spel digested fragment of phage HH3b, cloned into 
pBluescript KS digested with Smal and Spel. The pLMl 
contains the full transcript of HU-UNC-53/l available 
at this moment (see fig 9) . 

3.pCB251: Equivalent construct of human 1 
homologue to expression construct pCDU2 cloned in an 
eukaryotic His-tag, Xpress Ab tag expression vector 

The phage HH3b was linearized using Xhol. A 
15 BamHI and Xbal restriction site were created on the 
pHH3b open reading frame using U2fw (5 • -aag-gga-tga- 
ttc-ggt-cag-gat-cct-tc-3 • ) and U-rv (5 • -caa-aag-tct- 
cta-gag-gag-gcc-agt-3 • ) as primers. The amplified 
fragment was then moved to P CR2 . 1 . This plasmid was 
20 named pCB250. The pHH3b fragment was removed from 
PCB250 using BamHI and Xbal and cloned in 
pcDNA3.l/HisC digested with the same enzymes. This 
plasmid, named pCB251 (figure 55), was checked by 
sequencing. pCB25l produces a cDNA which codes for a 
25 fusion protein consisting of a 49 amino acid 

aminoterminal fragment harboring a His-tag and a T7 
epitope tag, followed by amino acids 828 to 1627 of 
the partial transcript of HU-Unc53/l. pCB25l was used 
on transient and stable transfection experiments 
30 carried out in N4 cells (see fig 56). 

4. pLM3: the partial transcript of HU-Unc531 
cloned in an eukaryotic His-tag, Xpress Ab tag 
expression vector 

35 
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pLMl was digested with EcoRV and Xbal. This 
fragment was cloned in pcDNA3 . 1/HisB, digested with 
the same enzymes. pLM3 produces a cDNA which codes 
for a fusion protein consisting of a 49 aminoacid 
aminoterminal fragment harboring a His-tag and a T7 
epitope tag, followed by amino acids 1 to 1627 of the 
transcript of HU-Unc53/l available at this moment. 
pLM3 was used on transient and stable transfection 
experiments carried out in N4 cells. 

5. pLM4: the partial transcript of HU-Unc53/l 
cloned in an eukaryotic GFP expression vector 

pLMl was digested with Clal and Xbal. This 
15 fragment was cloned in pEGFP-cl, digested with AccI 

and Xbal. This plasmid was named pLM4 . This plasmid 
produces a cDNA which codes for a fusion protein 
consisting of GFP, followed by aminoacid 1 to 1627 of 
the transcript of HU-Unc53/l. pLM4 was used on 
20 transient and stable transfection experiments carried 
out in N4 cells (see figs 43 and 46) . 



25 



stable t r ansaction of MCF-7 cells ; 



Cells were seeded at a density of 2xl0 6 cells in 
a 7 5 cm" flask using standard culture medium 
((Dubecco's MEM, 450 mg/1 glucose, 862 mg/1 L-Alanyl- 
L-Glutamin, 110 mg/1 Na-pyruvate; GibcoBRL) 

30 supplemented with 10% foetal calf serum (FCS; 

GibcoBRL), and 100 U/ml penicillin (GibcoBRL) and 100 
/ug/ml streptomycin) . The culture was grown at 37°C in 
a 10% CO. atmosphere, to approximately 70% confluency 
(approximately 18 hours) . The culture medium was 

35 removed and 10 ml MEM-HEPES (GibcoBRL) supplemented 
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with 10% FCS was added to the cells. The culture was 
further incubated for four hours at 37"C in standard 
sterile air. DNA-CaCl 2 was meanwhile prepared by 
mixing 30 »g DNA in 0.1 x TE (l mM Tris. Hcl, p H 7.2, 
0.1 mM EDTA, P H 8) and 0.1 ml 1.25 M CaCl,/HEPES (1.25 
M CaCl 2/ 0.125 M HEPES; pH 7.05). o.l x TE was added 
to a final volume of 0.5 ml. The DNA-CaCl 2 was added 
drop by drop to 0.5 ml BS/HEPES (25 mM HEPES, 0.25 M 
NaCl, 0.01 M Kcl, 1.4 mM Na.HPO,, 0.01 M glucose, p H 
7.05) while pipeting a sterile airflow through the 
latter solutions. The DNA-Ca, (PO<) , precipitate was 
then placed at 37*C for ten minutes. The DNA-Ca, (P0 4 ) 2 
precipitate was vortexed and added to the cells, 
together with 100 //l of a 0.01 M chloroquine (Sigma) 
stock in H ; 0. After four hours of incubation at 37°c 
in sterile standard air, the medium was removed, and 
the cells were washed with PBS (GibcoBRL) . 25 ml of 
medium was added and the cells where placed at 37°C in 
a 10% CO, atmosphere. After 48 hours of incubation, 
the cells were harvested, diluted and cultivated under 
selection (600 M g/ml G418 (Duchefa) ) for two weeks 
prior to clone selection. Mock transfected MCF-7 were 
positive for the beta-galactosidase transgene. The 
stability of transfection in MCF-7 was assessed by 
passaging cells four times in the absence of Geneticin 
and then re-exposing them to the selector agent. In 
these experiments, unc-53 or mock transfected cells 
proliferated, whereas untransf ected MCF-7 cells 
proliferated at a much slower rate. 

stable transaction of W 4 n flll mh^ 3toma 

Cells were seeded at a density of 2x10° cells in 
a 25 cm- flask using standard culture medium ( (MEM 
Rega 3; GibcoBRL) supplemented with 10% FCS, 0.14% 
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Na 2 C0 3 , 

2 mM glutamine, 100 U/ml penicillin, and 100 ixq 
streptomycin) . The culture was grown overnight at 
37°C in a 10% C0 2 atmosphere. Transfection mixture was 
5 prepared by adding 12 ^9 DNA in 600 fxl optimem 

1 (GibcoBRL) to 36 nl Lipof ectAMINE (GibcoBRL) in 600 
fxl optimem 1. This was done by adding drop by drop 
the first solution to the second. The mixture was 
placed for 3 0 minutes at room temperature, after which 
10 1.8 ml of optimem 1 was added. In the meanwhile the 
cell culture was washed twice with optimem 1, and the 
3 ml of transfection mixture was added. The culture 
was placed at 37°C in sterile standard air. After 
four hours, 3 ml or normal culture medium was added 
15 and the culture was placed at 37 : 'C under 10% of CO,. 

18 hours later, the culture was washed with PBS, and 
fresh normal culture medium was added. A further 24 
hours later, the cells were harvested, diluted and 
cultured under selection (750 /vg/ml G418) for two 
20 weeks prior to clone selection. 

f^ti"" " f cel ^ for Immunofluorescence 
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Medium was removed from the 9 cm" wells 
containing the coverslips. A 4% solution of 
paraformaldehyde (Sigma) in PHEM (1 g/1 glucose, 0.4 
g/1 Kcl, 8 g.l NaCl, 0.06 g/1 KH : PO,, 0.0475 g/1 
Na,HP0 4 , 0.35 g/1 NaHCO,, 1.51 g/1 PIPES, 0.76 g/1 
EGTA, 0.19 g/1 Mgcl,; pH 6) was added for 30 min at 
room temperature. The fixative was removed, and the 
coverslips were washed three times 10 minutes with 
PHEM. The covers 1 ides were then placed in PHEM, 
containing 0.5% Triton-XlOO (Serva) for 30 minutes, 
after which th slide was washed again for three times 
10 minutes with PHEM. The coverslips were then placed 
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under PBS (0.14 M NaCl, 2.7 mM Kcl, io mM Na 2 HPO«, 1.8 
mM KH 2 PO,, pH 7.3) containing 0.2% Tween (Sigma) for at 
least one hour at 4° c 

5 Immunofluorean ftnee 9tai n j n g 

The coverslips were inverted on 35 ^1 of 
appropriately diluted antibody, being YL 1/2 for 
tubulin and/or mab 16-48-2 monoclonal or anti-UNC53 
10 (gp48) polyclonal antibody for UNC53. The slides were 
placed at 4*C for at least 18 hours. Excess of 
primary antibody was then removed by washes of three 
times ten minutes in PBS-Tween. The slides were then 
treated with secondary antibody in the same way as for 
15 the primary antibody. F-actin was labelled by 

including TRITC- or FITC coupled phalloidine to the 
incubation buffer. The inverted slides on the 
secondary antibody were left at room temperature for 
approximately one hour. Slides were then washed again 
for three times ten minutes with PBS-Tween and once 
with PBS. The coverslips were mounted on slides with 
the medium described by Herzog et al. (Cell Biology: 
a laboratory handbook, 1994, Academic Press, 355-360). 
After at least two hours, slides were ready for 
25 analysis. 

Time 1ap g? ^n^y-lfr 



20 



30 



Analysis of the behaviour and movement of growing 
cell cultures was done by placing a non-confluent 
culture under a phase contrast microscope equipped 
with a temperature controlled stage (37°C) . images 
were recorded using a CCD camera (COHU 4912) coupled 
to a SCION LG3 framegrabber in a Macintosh ppc 8100 
35 running NIH image version 1.60. images were recorded 
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at time intervals, varying from 15 sec to 1 min. for 
half an hour to two hours. Image enhancement and 
playback was done in NIH image. 

A variety of cell types were shown to migrate 
over colloidal gold coated culture plastic or glass 
and displace or phagocytose the gold lawn on their way 
10 while locomoting. The track left bare is a 

qualitative and quantitative measure of cell motility 
and/or locomotion. The basic methods have been 
described in detail elsewhere (Albrechr-Buehler , 1977, 
Cell, 11: 395, Zetter, 1980; Nature, 285: 41; O'Keefe 
15 et al., 1983; J. Invest. Dermatol., 85: 130). Culture 
plates were gelatin and gold coated as described by 
Albrecht-Buehler (1977). Unc-53 and mock transfected 
MCF-7 were seeded in plates at low density and allowed 
to adhere to the plate and to locomote overnight. 
20 Cells were chemically fixed to the plate, washed and 
air-dried. Images of the gold lawns were captured 
using automated videomicroscopy ; composite images of 
the wells were generated and single-cell phagokinesis 
tracks were measured using a home-made routine in 
25 SCIL™ software. 

n, olg qans-PTJr-SS nr» f «*r-ent ia 11 V binds 

m^^M 11 " P lus eT1<1 * or 9TP-tu . bu . Un 

30 !. cloning of c.elegans cDNA in pEGFP-Cl and 

construction of c-terminal deletion derivatives. 

a) mn^.micti m °i r.rp-nncn N-terminal fusion; 
A PCR experiment was performed under 
35 standard conditions, using P TB7 2 as template and cpl7 
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(ata gcc aga tct acg tea aat gta gaa ttg) and cpl8 
(ttt aga aac cgc ggg tgg) as primers. The resulting 
0.4 kb fragment, coding for the N-terminal fragment of 
C.elegans Unc 53 was cloned in vector P CR2.1 (original 
TA cloning kit, Invitrogen) , resulting in plasmid 
PTA1718. The 0.4 kb fragment was isolated as a Bglll- 
Sacll fragment and cloned in pEGFP-Cl (Clonetech) 
digested with the same enzymes. The resulting plasmid 
was designated pEGFPsac (Figure 29) . pEGFPsac encodes 
the N-terminal 13 aa of C.e.Unc53 in fusion with GFP. 

b ) r.nctrnrtion " f » "FP-P p. TTI1CS3 fill! length 
fusion: 

Plasmid pTB72 (shown in Figure 1) was 
digested with restriction enzymes SacII and Apal. The 
resulting 4.5 kb cDNA fragment, encoding for the C- 
terminal fragment of C.elegans Unc53 was cloned in 
plasmid pEGFPsac (Figure 29), digested with the same 
enzymes, resulting in plasmid pEFP72 (Figure 30). 
Plasmid pEGFP encodes GFP in fusion with the full 

length C.e. Unc53. 

c) ^ nctrnrhiP r »f N-term i na 1 deletion? of SFP- 
r Plpnanc rmr-^ fu fj^ n protein Other than pF . GTPSac; 

pEGFP72 was digested with Smal. The 
resulting 7 . 0 kb fragment was religated and 
transformed in E.coli, resulting in plasmid pEGFPsma 
(Figure 31). This plasmid codes for the first 760 aa 
of the Ce-UNC-53 in fusion GFP. 

pEGFP72 was digested with restriction enzymes 
EC1136II and Smal , the resulting plasmid after 
ligation and transformation in E.coli of the 6.7 kb 
fragment was designated pEGFPecl (Figure 32). This 
plasmid codes for the N-terminal 670 aa of the C.e. 
Unc53 in fusion with GFP. pEGFP72 was further 
digested with Smal and Xbal . The latter site was made 
blunt with Klenow polymerase. The resulting fragment 
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of 5.4 kb was religated and transformed in E.coli. 
The resulting plasmid was designated pEGFxba (Figure 
33) . This plasmid codes for the N-terminal 256 aa of 
C.elegans Unc53 in fusion with GFP. 

2. constructing a hui-UNC-53-GFP fusion, and a 
deletion derivative 

The 5.4 kb hul-unc53 fragment was isolated as 
Clal-Xbal fragment from pLMi (Figure 54), and cloned 
in pEGFP-ci digested with AccI and Xbal. pEGFP-Cl was 
isolated from E.coli GM41 (Hfa H, daro-3, thi-i, rel- 
1) • This makes the Xbal restriction site available 
for restriction digest. The resulting plasmid was 
15 designated pLM4 (Figure 34). 

3. Visualisation of gfp fluorescence in N4 

cells 



10 



20 



N4 neuroblastoma lines where seeded in Lab Tek 
chambered coverglass (Nalge Nunc International) and 
transfected using lipofect AMINE (GibcoBRL) . After 18 
hours, the chambered coverglasses where placed on a 
inverted microscope, and GFP fluorescence could be 
25 visulalised. 



4. Staining GFP fusion expressing cells with 
antibodies 



30 Transfection with the GFP fusion constructed was 

also performed on coverglasses in a 6-well plate. 
After paraformaldehyde or methanol-acetone fixation, 
cells could be stained for actin cytoskeleton with 
TRITC-phalloidine, for hu-unc53 with sera 28.1 and for 

35 tubuline with YL1/2 antibody. Visualisation was then 



BNSDOCID: <WO_9824810A2_L> 




1 



WO 98/24810 



PCT/EP97/06956 



- 91 - 



10 



15 



20 



25 



30 



performed on a axioplan (Zeiss microscope)* 

Methods of Producing and Observing the Effects of 
a Chimeric unc-53 Gene 

1. Definition of a promoter region in the unc- 
53 C.elegans gene: 

The genomic region from the position 15621 to 
18415 in the C.elegans unc-53 gene, called promoter A, 
was cloned and fused to the cDNA of the GFP gene 
(clone pA/GFP, or pNP10)(cf. fig. 51)- This construct 
is injected into wild type worms (N2) . Transgenic 
line express GFP in different neurones: the two pairs 
of pioneering neurones PVP and PVQ, both BDU neurones, 
both ALN and PLN neurones, both PDE neurones, both PVM 
neurones, and 4 vulval cells. Expression begins in 
early embryogenesis , when the axons of those neurones 
grow out. 

2. Mutant Phenotype in Unc-53 (nl52) alleles: 

In wild type worms (N2) , the two pairs of ALN and 
PLN neurones each send an axon in a straight line 
longitudinally from the tail to the head (see 
fig. 50a). In unc-53 (nl52) alleles, the axons are 
shorter and often branch in a dorso ventral direction 
(see fig. 50b). The neurones are visualised with the 
construct pA/GFP, injected in unc-53 (nl52) worms. 

3. The minigene pA/unc-53 rescues the 
elongation defect of ALN and PLN neurones: 



BNSDOCID: <WO 982481 0A2_L> 



• 




WO 98/24810 



PCT/EP97/06956 



f 



- 92 - 



10 



15 



20 



25 



30 



The promoter A from the C.elegans unc-53 gene was 
fused to the cDNA of the C.elegans unc-53 gene (clone 
pA/unc53, or pNP9) . This construct was injected in 
unc-53 (nl52) mutant worms, together with the pA/GFP 
construct described above to visualise the ALN and PLM 
neurones. The elongation defects of those neurones in 
the unc-53 mutant are almost completely restored by 
the expression of the unc-53 cDNA express under the 
promoter A (see figs. 50 and 51b) . 

4. Domain swap between the C.elegans and human 

linn — C; "} rt oriA . 

To test whether the vertebrate and worm members 
of the unc-53 family are functionally equivalent, we 
tested the ability of the human gene to rescue the 
mutant phenotype in the worm. We replaced the 
carboxyterminal predicted nucleotide binding domain 
(NTPase) of the worm protein with the homologius 
fragment of the human 1 gene. 

The clone pA/unc-53 was deleted of the C.elegans 
NTPase domain, from the Hpal site, position 29800 on 
the genomic of unc-53, and replaced by the equivalent 
domain of the human-1 gene (unc-53Hl) (see fig. 51) . 
The resulting clone is named pA/unc-53Hl. When this 
clone is injected to unc-53 (n!52) mutants, the 
transgenic worms show a significant but incomplete 
rescue of the defect in the elongation of the ALN and 
PLN neurones (see fig. 51b) . The axons are longer, 
often elongated until the region of the vulva in a 
straight line, without branching dorsally anymore. 
This result shows that a NTPase region of the human 
unc-53 homologue can functionally replace the NTPase 
region of the C.elegans worm. 
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The degree of rescue was analyzed quantitatively 
and summarized in Figure 51b: 
The four strains compared are: 

wt; un-53 (nl52) ; unc-53 (nl52 ) , pA/unc-53 ;unc- 
5 53 (152) ,pA/unc-53-Hl. 

The various phenotypes observed are brought together 
in three large classes: 

«wild type>> the axon is straight, unbranched and 
migrates into the head; 
10 «vulva>> the axon is straight, unbranched and stops 
in the vulva region; 

<<mutant>> the axon is short, does not reach the vulva 
region and has collateral branches. 

The figures are indicated as a percentage. The number 
15 of axons observed is indicated in the following 
column. 

The data clearly show demonstrate conclude that 
the nematode/human chimera minigene pA/unc-53-Hl 
partly rescues the defects of the axonal migration of 
20 the ALN and PLN neurones and demonstrate conservation 
of function of this domain between man and worm. The 
transgenic lines provide a functional screening assay 
for the motility function of at least part of the 
human UNC-53 gene. 

25 

II. Materials and methods 

1 - Cloning: 
30 a) pAB/GFP (pNP3 - Figure 27) 

The gene of GFP has been amplified by PCR with 
cpn3 oligo-nucleotides 

"acattaagcttcgtacgcttggagggtaccg" and Cpn5 
3 5 "gaaaggatccgtacgataaggtattttgtgtcgg" on the plasmid 
pPD95 . 75 (Figure 59) so as to be inserted at the 5' 
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position in fusion into the exon 12 of the unc-53 gene 
at a single restriction site SplI and contains its 
stop codon at 3' plus one polyadenilation site. The 
PGR amplification product is directed by Hindlll and 
BamHI, sites which are contained respectively in the 
cpn3 and cpn5 oligonucleotides and sub-cloned in the 
pBS vector (clone pNP2) . The GFP is then excised from 
the pNP2 clone at the site SplI and integrated into 
the X16 clone (Figure 60) originating from sub-cloning 
of the lambda phage S4 digested by Xhol. The X16 clone 
containing the genomic sequence of unc-53 from the 

v-tj-i^ A 4- A 1/^^11 4- i t *-v>-* n A Q Q 1 -i *-i 4- K a 

^VQX VXWll X «J \J 4C< X ^.W £J W .J .1. L.X^1I A. -¥ W .X. WXW1I«W .!> 1 1 

site Xhol of pBS. 

b) pAB/unc-53 (pNP8 - Figure 35) 

The promoter region AB of the X16 clone (between 
PstI and SplI) has been inserted in the clone pTB115 
(Figure 58) in which the region between the sites PstI 
and SplI, containing the promoter of the gene mec-7 
and the start of the gene unc-53, has been removed. 

c) pA/GFP (pNPIO - Figure 56) 

The promoter region A has come from the X16 clone 
between the sites PstI and Nhel and integrated in the 
vector pPD95.75 containing the GFP in the sites PstI 
and Xbal . 

d) pA/unc-53 (pNP9 - Figure 44) 

The promoter region A has come from the X16 clone 
between the sites PstI and BstXI and is integrated 
into the clone pTB115 in which the region between the 
sites PstI and BstXI, containing the promoter of the 
gene mec-7 and the start of the gene unc-53, has been 
removed . 

e) pA/unc-53 -HI (pCBSOl - Figure 57) 
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The clone pA/unc-53 (pNP9) has been deleted from 
the region 3 1 of the gene unc-5 3 of the nematode 
between the sites Hpal and Ncol . The 3 1 region of the 
Hlunc-53 gene has been amplified by. PCR with the 
5 oligonucleotides U4Afw (5 ' -gca-cat-cgt-taa-cgg-gga- 
ctt-gaa-gc-3 1 ) and Urv ( 5 1 -caa-aag-tct-cta-gag-gcc- 
agt-3 ' ) and digested with Hpal and Xbal. After a 
filling stage with T4 polymerase , the ligation is 
effected with a complete end. 

10 

2-Injection 

Conventional injection techniques are used (Fire A. , 
1986, Mello G, et al, 1991, journal Mello G. and Fire 

15 A., 1995). Young hermaphrodite adults are injected in 
their two syncytial gonads. The DNA used is prepared 
in standard manner (Qiagen) followed by precipitation 
with lithium chloride. After an extensive rinsing 
stage to eliminate all the salts, the DNA is 

20 resuspended in water. The injection solution contains 
the different DNAs at a concentration of 100 ng/il in 
an injection buffer: . The plasmid pRF4 containing the 
dominant allele su 1006 of the gene rol-6 (Kramer J. 
et al, 1990, Mello C. et al, 1991) is used as a 

25 transformation co-marker. The descendants of roller 

phenotype of the hermaphrodite injected are isolated. 
Approximately 10 % of these transf ormants will yield a 
stable strain, in which the different DNAs injected 
are associated to form a mini-chromosome which will 

30 segregate as unstable extrachromosomal arrays. All the 
transgenic strains obtained were tested by PCR for the 
presence of the DNA injected, using a specific primer 
of the vector and a primer in the gene (results not 
shown) . 

35 

3. Microscopy 
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The nematodes are observed under a ZEISS Axioplan 
microscope provided with Nomarski lenses, with 40X 
Neofluar, 63X Plan-Apochromat , 100X Plan-Apochromat 
objective lenses. For fluorescence observation the 
luminous source is a mercury bulb. Different ZEISS 
filters are used: 

- for observation under GFP fluorescence, FITC filter: 
blue excitation line at 588 nm, emission through a 
515-565 nm band-pass filter; 

- for observation of the antibody labelling with a 
secondary antibody coupled to the TRITC: 

CAUX WCA UXUll WiiX w v^vj A » *s -m v-> * »*«» jp- — — — — — — — f — •— — — 

through a 590 nm long-pass filter. 

The image acquisition is effected by means of a CCD 
camera and and NIH image program using a Machintosh 
computer. The images are processed using the Adobe 
Photoshop program . 
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Sequence List^ry 



The following sequences are referred to in the 
specification: 

5 

Sequence ID No 1 is an amino acid sequence of 
human homologue l of UNC-53 protein illustrated in 
Figure 9b. 

Sequence ID No 2 is an amino acid sequence of 
10 human homologue 2 of UNC-53 protein illustrated in 
figure lid. 

Sequence ID NO 1 is a nucleic acid sequence of 
the hu-l-unc-53 gene illustrated in Figure 9b. 

Sequence ID No 4 is a nucleic acid sequence of 
15 the hu-2-unc-53 gene illustrated in Figure lid. 

Sequence ID No is a nucleotide sequence of 
Phage Lamda Clone 3b deposited under Accession No LMBP 
3595. illustrated in Figure 9. 

Sequence ID No 6 is a nucleotide sequence of 
20 plasmid pLMl deposited under Accession No LMBP 3762 
and illustrated in fig 54. 

Sequence ID No 7 is a nucleotide sequence of 
plasmid pLM4 deposited under Accession No 3763 and 
illustrated in fig 34. 
25 Sequence ID No a is a nucleotide sequence of 

plasmid pEGFP72 deposited under LMBP Accession No 3764 
and illustrated in fig 30. 

Sequence ID No 9 is a nucleotide sequence of 
plasmid pCB50l deposited under Accession No 3765 and 
30 illustrated in fig 57. 

Sequence ID No 10 is a nucleotide sequence of 
plasmid pCB201 deposited under Accession No. LMBP 
3594. 
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— — ■ tfu- iwc-rr/. fW*« 



GATaTCTGC AGAATTCGGCTTCTTTGAGC AAGTTC AGCC nSGTTAAGTCCAAGCTGAATTCCGGGGAAAGCCGAGCCGGA TCCC fCGAJGACCCTA fGC3 
C"ATAGACGTCTTAAGCCGAAGAAACTCGTTCAAGTCGGACCAATTCAGGTTCGaCTTAAGGCCCC TTTCGGC TCGGCCTAGGGAGC TCCTGGGATACGC 

r *" : — 



■pCR2.1 linker 1 lambda gt10 primer EcoRI 



- suspect sequence linker? L -pHHl4-3 — 



( SAEFGFFEQVCPG VQAEFRGKPSRIPR3PYA 

GASGTCAAGCCGCTCAGCAAGGCGCCTGAAGCGGCCGTGAGCGAAGATGGCAAATCGGACGACGAGCTGCTCTCCAGCAAGGCCAAGGCGCAAAAGAGCT 

■ 1 1 ■ i ■ . i 1 — i 1 ' ' ' ' ' 1 ' 1 2C0 

CTCCAGTTCGGCGAGTCGTTCCGCGGACTTCGCCGGCACTCGCTTCTACCGTTTAGCCTGCTGCTCGACGAGAGGTCGTTCCGGTTCCGCGTTTTCTCGA 



-pHH14-3 



£VKPLSKAPEAAVS£DG K S D 0 E L L S 5 K A K A Q K 5 

CTaGGCCTGTCCCCTCTGCCAAGGGCCAGGAGGAGCGCGCCTTCCTCAAGGTGGACCCCGAGCTGGTGGTGACCGTGCTGGGAGACCTGGAGCAGCTGCT 

_ - . . i . t i ■ — ii< i ' i i i ■ 

GACCC6GACAGGGGAGACGGTTCCCGGTCCTCCTCGCGCGGAAGGAGTTCCACCTGGGGCTCGACCACCACTGGCACGACCCTCTGGACCTCGTCGAC6A 



-pHHM-3 



SGPVPSAKGQE EaAFLK VPP Et VVTVL GDL EQLL 

CTTCAGCCAGATGCTGGACCCAGAGTCCCAGAGAAAGAGGACAGTGCAGAATGTCCTGGATCTCCGGCAGAACCTGGAAGAGACCATG7CCAGCCTGCGA 

i ) i I i [ | I i t I 1 i . i i I . ii U(j.j 

GAAGTCGGTCTACGACCTGGGTCTCAGGGTCTCTTTCTCCTGTCACGTCTTACAGGACCTAGAGGCCGTCTTGGACCTTCTCTGGTACAGGTCGGACGCT 



-pHH14-3 



-ORF (1-573t:p) a fiLM7 ORF 



- full available OR? HU-Unc53/1 s pLM1 OR 



FSQHL0PESQ5KRTV0NVL0LRQNLEETMSSLK 



GG"CCCAGGT'3ACTCACAGCTCCCTGGAGA"; 


»CCT3CTACGACAGCGATGATGCCAACCCACCCAGCGrGTCCAGCCTCTCCAACCGCTCGTCCCCTC 

( , i 1 ■ | 1 i 1 1 ' — 


C C Z AG GG T C C AC 7 G AG T G T 3G AG GG AC C TC T A C " 


"CjACoATGCTGTCGCTAC TACGGTTGGGTGCGTCGCACAGGTCGGaGAGGT TGGCGAjCAGGGGAj 


— — p HH14-3 — » ■ 




OR? ( I = pLM7 CRF — 


s q v r - 3 s l E ^ 


Ml available OHH HU-Unc62/ • = pLM i OR 

- : Y0SDDANP3SVS3LSNR3SP 


-■3-:ArGGCGcrirG3CCA3-c:A3"'cc:ca':: 


"iCAG3CTGGTGACGCGCCCTCT3*G33T3GGAGCTGCCGCTCGGAGGGGACGCCCGCCT3GTACA~ 


A-~ r ACCGCGATACCGGTI AGG'Z AGGCGCCj 


::G;:3GACCACTGCGC33GAGACAC:CACCCTCGAC3GCGAGCCTCCCCTGCGGGC33ACCATGTi 


pHH14-3 "~ 




ORF f1-S?9b?;. - pL.M7 CRF ~ 


> V F / 3 i 3 : 


■„ | available ORF HU*Unc33/S = pLWl OR 

: A C C A P 5 V 0 0 .5 C P 5 E 3 T P a v Y 
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aCGG CGAACGGGCCCACTACTCCCACACCAToCCC ATGCGCAGCCCCAGCAAGC 7C AGCC A7A7C TCCCGCC7GGAGC TGG TCGAA7CCC 7GGA3TC3 ^ 

cgtgccgcttgcccgggtgatgagggtgtggtaccggtacgcgtccgggtcgttcgagtcggtatagagggcggacctcgaccagcttagggacctgag: 



-pHHl4-3 



- ORF • 1 -579Q&) = pLM7 ORF 



-full available ORF HU-Unc53/1 = pLMl OR 



HGERAHYSHTMPMRS P S K L S H t S R L E L V E S L D 3 
GATGAGGTCCACCTCAAGTCCCCC TACATGACCGACAGTGACCTC^ 

CTACTCCACCTGGAGTTCAGGCCGATGTACTCGCTGTCACTGGAGTACCCGTTCTGGTACTGCCTCCTACTACTGTAGTGATGGCCGACCCrACTTTCGT 




-Ml available ORF HU-UncS3/l » pLM1 OR 



CEVDLKSG YMSDSOLMGK TM TEO DD I TT GV DES 



GCTCCATCAGTAGTGGACTCAGCGAT GCCTCAGACAATC TCAGTTCAGAAGAATTCAATGCCAGCTCCTCACTCAACTCCCTCCCAAGTACTCCCACTGC 
CGAGGTAGTCATCACCTGAGTCGCTACGGAGTCTGTTAGAGTCAAGTCTTCTTAAGTTACGGTCGAGGAGTGAGTTGAGGGAGGGTTCATGAGGGTGAC3 



-pHH14^ 



-PC8212 



-ORF {1-57Sbp3 = pLM7 ORF 



-iuli available ORF HU-Unc53/1 «pLM1 OR 



33ISSGLS0AS 5 >* L S S E E F N A S S 3 L M S L P S T P T A 

7t:tcgcaggaactcaacaatagtgctacgcacagac 7cagagaagcgctcactggcagaaagtgggctgagctggtttagtgaatcagasgagaaagc^ 

AA3A3CGTCCTTGAGTTG77ATCACGATGCG7G7CT3A3rCTCTTCGCGAGrGACCGTCTTTCACCCGACTCGACCAAArCACrTAGTCrCCTCTTrCG3 



-pHH14-3 



-pCB212 



-fdl available CRr HU-Unc53/i = pLM1 OR 



PR N3T [ V L R r C 3E K R 5LAE SGLSVFSE 3 



CcrAAAAAACTGGAGrACGACAG7GGTA3CC7GAA uA-3GAACC7GGGACrrC7AAG7GGCGGAGGGAGC3GCC7GAGAGCTGTGATGA7TCA7CCA^G: 
GGATrnrTGACCrCATGCTGTCACCATCGGACr-CTACCrTGGACCCTGAAGATTCACCGCCTCCCTCGCCGGACTCTCGACAC TAC7AAG7AGGT7C: 



-pHHH-3 



-pCB212 



available ORF HUUnc53H = pLMl OR 



^* e P 3 r 5 K V R P t P 3 £ : C D 0 3 S \ 
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0 73GAGAAC TGAAAAAGCCCATCAGCCTGGGCCACCCTGGTTCCCTGAAGAAGGGCAAGACCCC ACC TG TGGC njTAACTrcCCCCATCACTCA-''*A''*Ai" 

C iCCTCT TGAC TTTTTCGGGTAG TCGGACCCGGTGGGACCAAGGGACTTC rTCCCGTTCTGGGGTGGACACCGACATTGAAGGGGGTAG TGAGTGTGT^G ^ 



-pHH 14-3 



-pCS212 



-full available ORF HU-Unc53/1 = pLMl OR 



GEL KKP I SLGHPGSLKKGKTPPVAVTSP 



T H T A 



CCAGAGTGCCCTCAAAGTCGCAGGCAAACCTGAGGGCAAAGCTACAGACAAGGGTAAGCTTGCAGTGAAGA ATACTGGGCTCCAACGCTCCTCCTCTGAr 
G'jTCrCACGGGAGTTTCAGCGTCCGTTTGGAC TCCC3TTTCGATGTCTGTTCCCATTCGAACGTCACTTCTTA TGACCCGAGGTTGCGAGGAGGAGACTA 



■pHHU-3 



-pCB212 



-full available ORF HU-UncS3/1 *= pLMl OR 



0 SAL K V A G K P E G K A T D K G K L AVKNTGLQRSSSO 

GCTGGTC GGGACCGCCTGAGTGATGCTAAGAAGCCCCCC TCGGGCATTGCTCGCCCCrCCACTTCGGGATCCTTTGGCT ACAAGAAGCCTCCTCCTGCCA 
CGACCAGCCCTGGCGGACTCACTACGATTCTTCGGGGGGAGCCCGrAACGAGCGGGGAGGTGAAGCCCTAGGAAACCGATGTTCTrCGGAGGAGGACGGT 



I «CC 

iuoumuu i u*v*uv.uu t rtUuAAACCG A TG T TCT TCGG AGGAGGACGG" 
pHH14-3 _ 



-pCB212 



-rull available ORF HU-Unc53/1 = pLM1 OR 



G R 0 R I SDAKKP^SG I ARPSTSGSFGYKKPPPA 



CAGGCACAGCCACTGTCATGCAAACTGGTGGrTCAGCCACTCTCAGCAAGArCCAGAAGTCCTCAGGCATCCCTGTCAAGCC AGTAAATGGGCGCAAGA: 
G-CCGTGTCGGTGACAGTACGTTTGACCACCAAGTCGGTGAGAGTCGTTCrAGGTCTTCAGGAGTCCGTAGGGACAGTTCGGTCATTTACCCGCGTTCTG 



-pHH14-3 



-pCB212 



»?ull available ORF HU-Unc53/1 = pLMl OR 



G T A r v " Q T G G S A T L S K I Q K S S G I PVKPVNGRKT 

T rAoATGTT TCC AAC AG FGCAGAGCC AGG A 7 TC C "SGC TCCTGGAGCCCGT TC TAAC A TCCAGTACCGC AGCCTGCCCCGGCC AGCC AAG 7CAAG" 
"CGAATC 'ACAAAGG T TG rCACGrCTC3GTCCrAAc3ACCCAGGACCrCGGGCAAGArTGrAGGrCA7G3CGTCGGACGGGGCCGGTCGGrTCAGTTCA 



-pHHl4-3 



PCB212 



-full a/a;lable ORF HU-Unc53/1 = pLMl OR 



L 0 v 3 ■'' 3AEPGFLAPGAR5N I0YR3LPRPA 
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rc fA rGAGCGTGACCGGCGGGCGGGGTGGACC TCGCCC TGTGAGC AGC AGC ATTGACCCCAG TC TCCTC AGCACCAAGCAGGGAGGCCT7ACGCCTTCCA 
AGATACTCGCAC TGGCCGCCCGCCCCACCTGGAGCGGSACAC TCGTCGTCGTAAC TGGGGTCAGAGGAG rCGTGGTTCGrCCC rCCGGAATGCGGAAGG" ' 



-pHH14-3 



-pCB212 



-PCB210-14 



-full available ORF HU-Unc53/1 =: dLM1 OR 



- rev primer HUSSr.M 1 

SMSV TGGftGGPRPVSS S IDPSLLSTKQGGLTP5 
GAC TG AAGGAGCCTACC AAGGTAGCC AG TGGGCGGACCACTCCAGCCCCTG TCAATCAGACAGATCGGG AAAAGGAGAAGGCCAAAGCCAAGGCAGTGGC 

. i , 1 , . ' i ■ 1 ■ f ■ -dC-: 

CTGACTTCCTCGGATGGTTCCATCGGTCACCCGCCTGGTGAGGTCGGGGACAGTTAGTCTGTCTAGCCCTTTTCCTCTTCCGGTTTCGGTTCCG TCACC3 



I , 

-pCB212 1 



-pHH14-3 



•pCB210-14 



-full available ORF HU-Unc53/1 = pLMI OR 



PLKEPTKVASG3TTPAPVNQTDREKEKAKAKAVA 

CTTGGAC TCAGACAACATC TCCTTGAAGAGTATTGGC7CCCCAGAAAGTAC TCCCAAGAACCAAGC AAGCCACCCCACAGCCACCAAGC TGGCAGAGCTt: 

i ' 1 i 1 ' ' ' ! " ' 1 ■ i -3CC 

GAACCfGAGTCTGTTGrAGAGGAACTTC TC A7 AACCG AGGGG TCTTTCATGAGGGTTCTTGGTTCGTTC GG TGGGGTGTCGGTGGTTCGACCGTCTCGAC 



•pHH14-3 



-pCB210-14 



-full available ORF HU-Unc53/i = pLMl OR 



L0SDNISLXS!G3PESTPKMQASHPTATKLA£L 

CCACCAACCCCTCTCAGGGCC ACAGCGAAGAGCTr'GTCAAACCACCCTCACTAGCCAATCTTGACAAGGrCAACTCCAACAGTCrGGATCrACCArCA" 
GG^GGTTGGGGAGAGTCCCGGTGTCGCTTCTCGAAACAGTTTGGTGGGAGTGATCGGTTAGAACTGTTCCAGTTGAGGTrGrCAGACCrAGATGGTAGTH 



-pHH14-3 



■pCB210-14 



- full ava:iableORF HU-Unc53/i = oLMI OR 



o:> TPLflATAK5F v.<PPSLANL0<VMSNSLDLP3 
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CCAGrGATACCACCCATGCTTCAAAGGTCCCAQATCTGCATGCTACAAGC fCAGCAT C TGGGGGCCC TC TCCC TTCC TGC T FC ACCCCC AG TCCGGCA*'*'* 
GGrCACTATGGTGGGTACGAAGTTTCCAGGGTCTAGACG rACGATGTTCGAGTCGTAGACCCCCGGGAGAGGGAAGGACGAAGTGGGGG TC4GGCC6TG*': 2 ' " 



■PHH14-3 



•pCB2lO-14 



-full available ORF HU-Unc53/l a pLMl OR 



s S 0 T T H A S K V P 0 L H A T S S A S G G P L P S CFTPSPAP 

CATCCTC AATATTAAC TCAGCCAGC TTC TCCCAGGGCCTGGAGCTAATGAGTGGTTTCAGTGrGCCA AAAGAGACCCGCATGTACCCCAAACTC TCAGGC 
G TAGGAGTTATAATTGAoTCGGTCGAAGAGGGTCCCGGACCTCGATTACTC ACCAAAGTCAC ACGGTTTTC TC TGGGCGTACATGGGGTTTGAGAGTCC3 



■pHHU-3 



-pC8210-14 



■full available ORF HU-UncS3/1 = pLMl OH 



' > N t N S A S r S Q G L E I MSGFSVPKETRMYPKLSG 

CTGCACAGGAGCATGGAGTCCCTCCAGATGCCAATGAGCCTCCCCAGTGCCTTCCCCAG CAGTACTCCCGTCCCCACCCCACCTGCTCCCCCfGCTGCTC 
GACGTGTCCTCGTACCTCAGGGAGGTCTACGGTTACTCGGAGGGGTCACGGAAGGGGTCGTCATGAGGGCAGGGGTGGGGTGGACGAGGGGGACGACGA3 "° 



•pHHl4-3 



•pCB2lO-14 



'ufl available ORF HU-Unc53/t = pLMl OR 



'- M3 5ME3LCM ?r'S LPSAFPSSTPVPTPPAPPAA 

CCACAGAAGAAGAGACGGAAGAGCTGACTTGG^a'GGAAGCCCCAGAGCTGGGCAACTGG ACAGTAATCAGCGGGATCGGAACAC TC TTCCC AAGAA£GG 
GG~G^CT TCTTCTCTSCCTTCTCGAC TGAACCTCACCTTCGGGGTCTCGACCCGTTGACCTG TCATTAG TCGCCCTAGCCTTG TGAGAAGGGTTCTTTCC 



-pHHl4-3 



-pHH3b 



-PCB210-14 



available ORF HU-Unc53/1 = pLMl OR 



rev pnmsr r-:;j5C-.-v3 I \ r** pnrrer HUMrvZ 



■ peptide 672o2aH 



F ^EEi T££_ r V 5 G S P 9 AGQLD5N0RDRNTLPKKG 
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tig_ Hu-Unc53/1 seq (1>6013) Site and Sequence , . 

GCrCAGGTACCACCTTCAGTCCCAGGAGGAGACCAAGGAGAG GCGACATTCCCATACCArTGGTCGGCTGCCTGAATCCGATGACCAGTCAGAGCTaCC" 
CGAGTCCATGGTCGAAGTCAGGGTCCTCCTCTGGTTCCTCTCCGCTGTAAGGGTATGGTAACCACCCGACGGACTTAGGCTACTGGTCAGTCTCGACGGA 

____ pHH14-3 

— pHH3b — 

. 1 

rav primer HUS3ivl l ^ 

i full available ORr HU-UncS3/1 - pLMl OR 

LRYQLQSQEE TKERRHSHT IGGLPESODQSELP 

TCTCCCCCTGCACTTCCCATGTCTCTGAGTGCAAAGGGCCAACTTACCAACATAGTGAGTCCCACTGCGGCCACCACGCCAAGAATCACCCGCTCCAAlh 

. ■ . -4 i ■ ■■ ' i i > ■ ■ I. . « i ii ■ — » i (■■■■>■ ■■■■ i ■■■«.. ■ ■ ■+. 26 

AGAGGGGGACGTGAAGGGTACAGAGACTCACGTTTCCCGGTTGAATGGTTGTATCACTCAGGGTGACGCCGGTGGTGCGGTTCTTAGTGGGCGAGGTTGT 

— pHH14-3 

pHH3b — 

— full available ORr HU-Unc53/1 =pLM1 OR — ■ 

sppalpmslsakgqltnivs pta attpr i TRSM 

gcatccccacccacgaggcggccttcgagctgtacagcggctcccaaatggggagcaccctgtccctggccgagagacccaagggaatgattcggtcagg 
cgtaggggtgggtgctccgccggaagctcgacatgtcgccgagggtttacccctcgtgggacagggaccggctctctgggttcccttactaagccagtcc 

— — pHH14-3 

pHH3b — 

lull available ORF HU-Unc63/1 a pLMl OR — 

3IPTHEAAFELYSGS QM GST LS LAERP KGM IRSG 

m TZC T rCCGAGACCCCACGGACGATG TTCACGGCTCAGTGCTGTCCCTGGCCTCCAGTGCCTCCTCCACCTAC TCC TCAGCTGAGGAGAGGATGCAA TC 7 
TA3GAAG3CTCTGGGGTGCCTGCTACAAGTGCC3AGTCACGACAGGGACCGGAGGTCACGGAGGAGGT3GATGAGG^GTCGACTCCTCTCCTACGTTAGh 



PHH14-3 ■ 1 

pHH3b 

full available ORF HU-Unc63/1 = pLM1 OR " 

5-RDPT0CVKG5VL5LASSAS STYSo AEERMOS 

G iGCAAA TCCGGAAGC TTCGTAGGGAAC TGGA ATC A TZZ :AG3AAAAAGTGGCCACC TTGACGTCTCAGCT TTC TGCCAATGC TAATCTGGTGGCTGC TT _ ^ _ 
C "CG rTT^GGCCTTCGAAGCATCCC TTGACCTTAGTA33 3TCCTTTTTCACCGG7GGAAC TGCAGAGTCGAAAGACGGTTACGAT TAGACCACCGACGAA 



-uzoRr ::: pc:a:-si ow : 

pHH3b 

a,a:-otl3CRr HU-Unc53/1 = pL.V1 OR 

|R<LRRELE_S50EK VATL TSCL S a N A N L V A A 
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•full available ORF HU-Unc53/l a pLMl OR 



FE QSLVH HTSR LRHLAETAEEKDT 



E L L o l r e r | 0 



CTT7C TGAAGAAAAAGAAC tctgaggcccaggcag tcagggagcccttaatgcctcagaaaccacacccaaagaacttcgg atc, 

GAAAGACTTCTTTTTCTTGAGACTCCGGGTCCGTCAGTAAGTCCCTCGGGAATTACGGAGTCTTTGGTGlGGGrTTCTTG^rr 



CAAGAGACAAAAC 
TAG77C TCTGTTTTG 



3ic»; 



•1)2 ORF n pCS25l OR? 



■ pHH3b 



full available OR? HU-Unc63/l = pLMl OR 



F > K K K N S E A Q a V 1QGALNASETTP 



K E L R I :< R 0 M 



^^^^^^^^^^^CCTCAACAGCATCAC ^^^^^^^^^^^^^^^Q^GCAGCAAGGATGCTGATGCGAAAAAnAAftAAA a a 



AGG AG TC ^ A TCGT AGAGTTCGGAGTTGrCGTAG"GATCGGTAAGGTCGTAGCCGTCGTCG TTCC TACGaCTACGcTttTtTt 

•U? ORr « pCKSSl ORf- 
-pHH3b 



AAAAGAGTTGGG 




-full available ORF HU-Unc53/t = pLMl OR 
S S 0 5 f S 3 L NSITSHSSIGSSKO 



AOAICKXKKKSV 



TCTArGAGCTTCGAAGTTCCTTCAACAAAGCGTTCAGTArAAAAAAGGGGCCCAA GTCAGCT 

AGaTaCTCGAAGC rTCAAGGAAGTTGTTTCGCAAG TZ ATATT TTTTCCCCGGGTTCAGTCGAAGGAflTA T~AfirrT.\r 



TCCTCATAC TCGGATATAGAGGAGATTGCTACACCC 



ATC TCCTCTAACG ATG TGSGC ■ 




-full available CRF HU-Unc53/l = pUVM OR 
'"' Y E L 3 S S r a ,< Ar £lK<GPKSA3S 



Y S 0 i £ E I a T F 0 



C T~C T TCAGCCCCCTCATCCCCC AAAC "ACAGC A'GG "3 TACAGAGAC 7GC TTCACCCTCCArCAAGTCCTCCACCTTGTCC fCCGT GGGC A'TGATGT" 

— — ^ 



-U2 ORr = pC82 : ii ORF 
-pHH3b 




BNSDOCID: <W0 982481 0A2J_> 
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fir *H u-Unc53/1 seq (1 > 6013) Site and Sequence 

ACCL .oCCCTGCTCACCCAGCCCCC CACACTAGGCTGTTCCATGCAAATGAGGAGGAGGAGCCAGAGAAGAAGGAGGTArCGGAGC~GCGCTCTGA»j^ 

TGGCTCCCCGGACGAGTGGGTC6GGGGGTGTGATCCSACAAGGTACGTTTACTCCTCCTCCTCGGTCTCTrCTrCCTCCA7AGCCTCGAC5CGAGACTC5 



U2 GRF-pC625t ORF 



pHH3b 

full available ORF HU-UncS3/1 = pLM1 OR 

TEGPAHPAPHT RL FHANEEE EPE KK EVS EL RSE 

TATGGGAGAAGGAAATGAAGCTTACAGA CATCCGCTTGGAGGCCCTCAACTCTGCCCACCAACTGGATCAGCTTCGGGAGACCATGCACAACATGCAGT" 
ATACCCTCTTCCTTTACTTCGAATGTCTGTAGGCGAACCTC CGGGAGTTGAGACGGGTGGTTGACCTAG TCGAAGCCCTC TGGTACGTG7TGTACGTCAA 

U2CHF = pCE225l OF-:.- 

— pHH3b — 



peptide B72627H — 

-lull available ORF HU-Unc53/1 a pLMl OR 

"U3QRFspLM5QRF - 

LWEKEMKLTDiaLEALNSAHOLDOLRETMHNMGL 



GGAGGTGGACC7GCTGAAAGCAGAGAATGACCGACT3AAGGTAGCCCCAGGCCCCTCATCAGGCTCCACTCCAGGGCAGGTCCCTGGATCATCTGCATTA 

, t ■ ; - ] i t— . 1 i i t t ' «i ) i : i I 3..-JI. 

CCTCCACCTGGACGACTTTCGTCTCTTACTGGC7GACTTCCATC GGGGTCCGGGGAGTAGTCCGAGGTGAGGTCCCGTCCAGGGACCTAGTAGACGTAA" 

U2 ORr = pCB25 S OR? — - 

pHH3b ~ 
full available ORF HU-Unc53/1 = pLM1 OR 



- U3 ORF = pLM5 ORF ■ — " 

EVOLUKAENOkLKVAPGPSSG ST PGQVPGSSAL 

rC7TCCCCACGCC5CTCCCrAGGCCTG3CAC7C ACCCAT7CCr7CGGCCCCAGTCTTGCAGACACAGACCrGTCACCCA7GGA7GGCArCAGTACTrGT3 ^ 
A0AAGGGG7GCGGCGAGGGA7CCGGACCuTGAG~GGG7AAGGAAGCCGGGGr CAGAACGTCTGTGTCTGGACAGTGGGTACCTACCGTAGTCATGAACA: 

U2 ORF = pCSZS ? ORF — — 



pHH3b 

lull available ORr HU-Unc53/1 = oLMI OR 



U3 ORF = pLMS ORF 

SSPRPSLGLAL'^SFGPSLACTDLSPMOGIS TC 



BNSDOCID: <WO 9824810A2J_> 
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Ilk ' u-UncS3/1 seq (t > 6013) Site and Sequence 



-i.i 9 



Page? 



G ^CC A AAGGAGGAAGTG ACCCTCCGGGTGGTGGTGA3G A TGCCCCCGCAGCACATCATCAAAGGGGAC rTGAAGC AGCAcGAATTCT TC 1 "TGGG f *T"~A-' > 
CAGGrtTCCTCCTTCACTGGGAGGCCCACCACCAC ^CTACGGGGGCGTCGTGTAGTAGrrTCCCCTGAACTTCGTCGTCCr rAAGAAQGACCCGA"AT." 



•U2GRF = pCB251 ORF 




-U3 ORF st pLMS ORF 



0 P K E E V T L R V V V R M P P Q H I lKG0LKQ QE r FFL 



G C 



CAAGGTC AGTGGAAAAGTTGACTGGAAGATGCTGGATGAAGC rGTTTTCCAAG TGTTCAAGGACTATATTTCTAAA ATGGACCCAGCCTCTACCCTGGGA 
.!if.LI5.F.^ TCACCTTTTCAACTGACC TTC TACGACC TACTTCGACAAAAGGTTCACAAGTTCC ^"GATATAAAGAtTTTACCTGGGTCGGAGATGGGACCCT 



-U2 ORF = pCB25 ! ORF 



-pHH3b 



-U3 ORFspLMS ORF 



K VS GKV DV KMLDE AVF QVFKDY t SKM Q P A S T L G 

C TAAGCACTGAG TCCATCC ArGGCTACAGCATCAGCC AC3TGAAACGAGTG TTGGATGCAGAGCCCCCCGAGATGCCTCCTTGCCG TCGAGGTG TCAATA 
GA7TCGTGACTCAGGTAGG TACCGATGTCGTAGTCGGTGCACTTTGCTCACAACCTACGrCTCGGGGGGCTCTACGGAGGAACGGCAGCTCCACAGTTAT 



■Ur? ORr r. pCS2Si OR? 



- pHH3b 



-U4 OR? = pC5 20* ORF 



-U3 ORF = pLM5 ORF 



: - S T E S 



PHH15 - 

* G Y 5 I3HVKRVL0A£P?EMppcR? 



a»J a TCAG TC TCCC TC AAAGG 'C To aaGG A3 a "3 "G T CGACAGCC TGG TG TTCG AGAC3CTGA "CCCCAAGCCGA TGATqCAGCAl "AC 



ATAAGCC7 



^ATAG TCAGAGGGaGTT rCCAGAC TfCC TC "**^C3C AGC rCTCGGACCACAACC'C T2CGAC T aGGGG TTCGGCTAC TACG TZQT'Z-T 



-■J 2 ORF = pCa2 : i : OR? 




•'•,.1 3 •« a: lath; ORF HuUnc53.'t = oLWl OR 



- U3 ORF = pLMS ORF 



:5 L 



pHHlS — 

' C 3 L V F £ f l : P * P K « 
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Tuesday, 18 November 1997 10:34 
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CC ■ 7G AAGCACCGGCGCCTCG7CCTC TCGGGCCCCAGCGGCACGGGCAAGACCTACCrGACCAAfCGC I"TGGCCGAG7ACC T"G3TGGAGCGCTCTGGC 
GOACGACTTCGTGGCCGCGGAGCAGGAGAGCCCGGGGTCGCCGTGCCCGTTCTGGATGGACTGGTTAGCGAACCGGCrCATGGACCACCTCGCGAGACCa 

— pHH3b 

U4 ORF = pCB201 ORF 

full available OflF HU-UncS3/1 a pLMI OR — 

U3 ORF s plMS ORF — 

pHHlS — 

LLKHRRLVLSGPSGTGKTYLTNRLAEYLVERSG 



CGTGAGGTC^CAGAGGGCATCGTCAGCACCTTCAACATGCACCAGCAGTCTTGCAAGGATCTGCAACTGTATCTTTCCAAC C TAG CC AACC AG A TAG ACC 
GCACTCCAGTGTCTCCCGTAGCAGTCGTGGAAGTTGTACGTGGTCGTCAGAACGTTCCTAGACGTTGACATAGAAAGGTTGGATCGGTTGGTCTATCTGG 



U2 ORF = pCB251 ORF 

pHH3b 

U4 ORF = pCB201 ORF - 

full available ORF HU-Unc53/1 s pLMI OR - 1 — 

-U3QRF = pLMSORF — 

pHHlS — 

-EvTc3iV3TrNMHQOSCKOLQLYLSMLAHQ(D 



GGG mAAC~GGAaT735GGA 7GTGCCCC TGG TGA" 7CTAT TGGA7GACC TGAoTGAAGCAGGCTCCATCAGTGAGTTGGTCaATGGGGCCCTCACCTGCAh 
CC" "7TG TCC T7AACCCCT ACACGGGGACC-C TA*GA7AACCTACTGGACTCACTTCGTCCGAGGTAGrCACTCAACCAG rrACCCCGC-GAGTGGACGT" 



ORF :n pCBi:St ORr 

pHH3b 

U4 ORF = pCB201 ORF 
'•jl available ORr HU-Unc53/1 = pLMt OR 
U3 ORF = pLM5 ORF 



— pHHIS 

E 

1 S I 2 :• V » . V w L 0 D L 3 E A G 3 i 3 E L V »l G i . T C > 



Page 1 6 
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fig M u-UncS3/1 seq (1 > 6013) Site and Sequence 7 

G'a7CA. AAATG TCCC 7AT ATTATAGGTACCACC AAfCAGCC TG T AAAAA TGACACCC AACCATGGC 77GC AC ITGAGCT rCAGGATGTTG ACCTTC'C- 



CAT AG TATTTACAGGGATATAATATCCATGGTGGT 7AGTCGGACATTTTTACTGTGGGTTGG TACCGAACGTGAAC TCGAAG TCCTACAAC TGGAAj 



•U2 ORF = pC32S1 ORF 



-pHH3b 



-U4 ORF = pCB201 ORF 



-full available ORr HU-UncS3/1 = pLM1 OR 



-U3 ORF s pLMS ORF 



•pHH15 



"peptide) 



YHKCPYt IGTTNQPVKMTPNHGLHL3FRML T F 5 

AACAACGTGGAGCCAGCCAATGGCTTCCTGGTTCGTTACCTGAGGAGGAAGCTGGTAGAGTCAGACAGCGACATCAATGCCAACAAGGAAGAGCTGCTTC 
TTGTTGC ACCTCGG TCGGTTACCGAAGGACCAAGCAATGGAC TCCTCCTTCGACCATCTCAG TCTGTCGCTGTAGTTACGGTTGTTCCTTCTCGACGAA3 



-pHH3b 



-U4 ORF = PCB201 ORF 



full available ORF HU-Unc53/1 a pLM1 OR 



-U3 ORF = pLM5 ORF 



-pHHlS 



SNVEPAHGFLVRYLRRKLVESOSDINANKEELL 

G'iu "GC TCGACTGGGTAZCCAAGCrG7GGT ATCA'CTCC ACACC TTCC 7TGAGAAGC ACAGC ACC TCAGACTTCC TCaTCGGCCC TTGCTTCTTTC TGT'J 
CCCACGAGCTGaCCC^T3GGTTCGACACCATAG"AGAGGTGTGGAAGGaACTCTTCG7GTCGTGGAGTCTGAAGGAG'AGCCGGGAACGAAGAAAGACa: 



•Ii2 ORF »: pCBSSI ORr 



- pHH3b 



-U4 ORF = pCB201ORF 



ull available ORr H'J-Unc53/1 - pLMI OR 



-U3 ORF = pLM5 ORF 



: pHH15 

t: V L D V V = K L • i - - H T F l E K H 5 T 3 C F I I G P C F F L I- 
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fig . Hj-Unc53/1 seq (1 > 6013) Site and Sequence *± 

G T*J 7CCC AT TGGCA7TGAGGAC 7 TCCGGACC 7GG 7 7C AT 7GACC TG 7GGAACAAC7C7ATCA 7TCCC 7 A TZ TACAGGAAJGAGCC AAGCArGGGA TAA;^ 
CA^AGGGTAACCGTAACTCCTGAAGGCCTGGACCAAGTAACTGGAC ACC7TGTTGAGATAGT AAG0GA7A3ATG7CC7 7CC7CGGT7CC 7ACCC7AT* TZ 



■U2 OR? = pCBtS? ORF 



- pHH3b 



-U4 ORF = pCB£01 ORF 



-full available ORF HU-Unc53/l ~ pLMlOR 



-U3 ORF = pLM5 ORF 



-pHH!5 



CP [ G IEDFRTVF IDLVNNSI IPYLQE^GA K3GI> 

GTCCATGGACAGAAAGCTGCTTGGGACGACCCAGTGGAATGGGTCCGGGACACACTTCCCTGGCCATCAGCCCAACAAGACCAATCAAAGCTGTACCAC: 

i - i i . i i i i i t | - - . ( . . . >...*, i . - ... i . i i , i . ..— i . .ill, . i -fi^-^p* 

CAGGTACCTGTCTTTCGACGAACCCTCCTGGGTCACCTTACCCAGGCCCTGTGTGAAGGGACCGGTAGTCGGGTTGTTCTGGTTAGTTTCGACATGG"G3 



-U2 0RF = pCBi-Si ORF 



■ pHH3b 



-U4QRF = pCB201 ORF 



-full available ORF HU-Unc53/l = pLMl OR 



-U3 ORF a pLM5 ORF 



-pHHlS 



v .i 2 0 :< A a y E D P V £ * V 3 D T L » w ? S a 0 Q 0 C S L V H 

'O'JCCCCiCCCACCGTGGGCCCTCACAGCATTGCCTC ACCTCCCGAGGATAGGACAGTCAAAGACAGCAC:CCAAGT7CTCTGGACT':aGA t CC"CTG.c" 
*'J 3GG3G rGGG7GGCACCCGGGAGTGTCGTAACGGA2 7jGAGGGCTCC TATCC7G7CAGTTTCTG7CG7GG3GT7CAAGAGACC TGAG7CTAGGAGACT j 



-U?.ORF =:: pCB2S1 OFir 



-pHH3b 



■U4 ORF = pCB201 ORF 



ull available ORF HU-Unc53.'1 = aLMt OR 



- U3 ORF = pLMSORF 



PHH15 ■ — 

L^PP T v G P H S rAS?PEORTVK037Po3LO.S 



BNSDOCID: <WO 982481 0A2_I_> 
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fio Hu-UncS3/1 seq (1>6Q13) Site and Sequence V 90 * 

GGC CA TGCTGC TGAAAC TTCAAGAAGC TGCCAAC T^C A r TGAGTCTCC AG A TCGAGAAACC^ TCCTGGACCCC AACCT TCA3GCAACAC rrrAAGGGTT- 
CCGGrACGACGACTrTGAAGTTCTTCGACGGTT3A:GTAACTCAGAGGTCTAGCTCrTToGTAGGACC7GGGGrTGGAAGTCCGTTGTGAAArrCCC 'A'* 

> 

U2 ORF = pC32S1 ORF _] 



* pHH3b 



■U4 ORF = pCB201 ORF 



■full available ORF HU-Unc53/1 » pLMl OR 



5 

:>. 



■ U3 ORF = pLM5 ORF 



:>. 



-PHH15 



-peptic* 572625K 



AMULKUQEAANY I ESPORET ILDPNLQATL G F 

GGCAATCACTGTCACCCCCG GACAGCAGAACGCTGGCATCAGCTATCTTAGCTCCTCCTCrCCCCTCTCCrCTTTCAGAGCACTGGCTCTCCAGCCCCAG 
CCGTTAGTGACAGTGGGGGCCTGTCGTCTTGCGACCGTAGTCGATAGAATCGAGGAGGAGAGGGGAGAGGAGAAAGTCTCGTGACCGAGAGGTCGGGGTC 



-pHH3b 



-pHHIS 



GNH CHPRT A E R V H 0 L S LLL SPLL FQS TGSPAP 

G^- jGAGAAC ^.GGAGGG A3GAGGAGATGAAAG^G3AGG oACAGGTTC 7TGGTGCTGTACC7TTGAGAAC77CCT AGGAAGGAATGGTGGGGT3GCGTTTGG 
C ■ LCTCTTGTCCTCCCTCC TCCTC TACTTTC7CC "CCCT3TCCAAG AACC ACGACA7GGAAACTC77GAAGGATCCT7CC TTACCACCCCACCGCAAACC 



-pHH3b 



-pHH15 



j E 0 E 3 G G D E £ 3 3 T G S V C C TFENFLGRNGGVAFG 

G AAC TTG TGCCCCC TAAAC ACA7T TACTGGCCT3C ~Z 7AATGAC TTTGGGGAAAACATGA7TCTGGGTC TT TCCCTTGAC TTCTTGTTTCAA7TAC AAA 'J 
C"GAAC ACGGGGGAT T7GTGTAAATGACCGGA3G A3 AT TAC TGAAACCCCTT TTCTACTAAGACCCAG AAAGGGAACTGAAGAACAAAGT 7AA7G TT 111 



-pHH3b 



-pHH15 



f ' *- C P L M TFTGLl. LVGKDD5GSFP L L V 5 I 7 N 

T C : TG GC C 7 T TC TG G 3 G AG GGG T TC AG A AA A C A ' C A A A A C AC TGC AGC AG 7 TCCT AAA TG AT 7C 7C ACAAGC A ACCC "GAGAG AGAC AG "C 7 73 7GAGGG 
A'")GACCCGAAAGACCCCTCCCCAAGTCTTTTGTA "T "T3TGACG7CG 7C AAGGATTTAC 7AAGAG 7G7TCGT TGGGAC7CTC7C TGTC A3AACAC TCCC 

' — 1. 



- pHH3b 



•pHH15 



V 0 * T ; • h C S :5 5 M I . ■ S \ ? E 



BNSDOCID: <WO 982481 0A2J_> 
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fig Hu-Unc53/1 seq (1 >6013> Site and Sequence u _ _____ 

AOATC rGGGGGAGGCAGGAAGCTCCTCAGATT TTC TC ACAGACCC TTCCC AATTCCATCACC AC TGCCAAC AAC *CC fCCCCC AG AC A TC T-ZGZ TCGAQC 

TCTAGACCCCCTCCGTCCTTCGAGGAGTCTAAAAGAGTGrCTGGGAAGGGTTAAGGTAGTGGTGACGGTTGTTGAGGAGGGGGTCTCTAGACCGACC"'**" ' 

> . 



- P HH15 1 



1WGRQEAPQ IrSOTLPNS [TTANNSSPROl AG 



CCAGAAAAAGAAGCATGTGGTTTAAAAAATGTTTAAATCAATCTGTAAAAGGTAAAAATGAAAAAACAAAAACAAGCAAACAAACAAAAAACAATGGAAA 
GGTCTTTTTCTTCGTACACCAAATTTTTTACAAArTTAGTTAGACATTTTCCATTTTTACTTTTTTGTTrrrGrTCGTTTGTTTGrTTTTTGTTACCTTT 
QKKKHVV.KMFKS I C K R K.KNKNKQTNKKQVt 

AGATGAAGCTGGAGAGAGAGGAACCAGTTGCCAAGGTAGAGAGCTGCCCGCTCCTGCCCTCrGGATGACATAGGGGACATCAACAAGACGGCTGCCAAC: 

i ■!■ i if.., . . — .« ■ ■ ■ ■ i ii ■■ i iif ■ l i ■ ..(■■— ■ t ■ a t ii i ..i ■ » ■ .i ■ I, ■ ■ i | ■> " i 

TCTAC TTCGACC TC TCTCTCCTTGGTC AAC GGT7C CATC TCTCGACGGGCGAGGACGGGAGACCTACTGTATCCCCTGTAGTTGTTCTGCCGACGGTTGG 
3 SVRERNQLPR .RAARSCPLOO IGOINKTAAM 

TGAGAAGTCACCAAACCACAAAAATAACCTTACAGCCTTCAGGGAAAGACTACCAGCTCTGTCTTTCTACCCTCTAATTTAACAATGCACCGGAATTCA3 

1 ' 1 1 ' ' i i i i i i i i i ■ i ■ i i 500." 

ACTCTTCAGTGGTTTGGTGTTTTTATTGGAATGTCGGAAGTCCCTTTCTGATGGTCGAGACAGAAAGATGGGAGATTAAATTGTTACGTGGCCTrAAGTC 



l-ltr 



■linker? — 

LRSHQTTK!TLGPSGKOYQLCLSTU.F,MNAPEF.j: 

C T7GGAC TTAACC 

■ ■ 6013 
GAACC TGAATTGG 



-tinker? 

0 L T 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 112 PCT/EP97/06956 




BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 




PCIYEP97/06956 



is: 
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3 6e# 



GaTaTC TGC AGAATTCGGC TTC T TTG AG CA AG T7CAGCC TGG TTAAGTCC A AGC TGA AT TCCGGGGaAaGCCG AGCCGGA TCCC FCGAGGACCC FA TGC 3 

C TA7AGACGTCTTAAGCCGAAGAAAC TCGTTCAAGTCGGACC AATTCAGGTTCGACfTAAGGCCCC T TTCGGC TCGGCCTAGGGAGC TCCrGGGATACG* - ^ 

pCR2.1 linker lambda gt 10 primer EcoRI ' suspect sequence linker? 1— pHHl4*3 — 



ISAEFGFFEQVQPG. VQAEFRG<PSR I PRGPy 

GAGGrCAAGCCGCrCAGCAAGGCGCCTGAAGCGGCCGTGAGCGAAGATGGCAAArCG GACGACGAGCTGCrCTCCAGCAAGGCCAAGGCGCAAAAGAGCT 
CTCCAGTTCGGCGAGTCGTTCCGCGGACTTCGCCGGCAC TCGCTTCTACCGTTTAGCCTGCTGC TCGACGAGAGGTCGTTCCGGTTCCGCGTTTTC TCGA 



■PHH14-3 



r V K P I SK A P E A A V S E D G K S O OELLSSKAKAQKS 

CTGGGCCTGTCCCCTCTGCCAAGGGCCAGGAGGAGCGCGCCTTC CTCAAGGTGGACCCCGAGCTGGTGG TGACCGTGC TGGGAGACCTGGAGCAGCTGCT 
GACCCGGACAGGGGAGACGGTTCCCGGTCCTCCTCGCGCGGAAGGAGTTCCACCTGGGGC TCGACCACCAC TGGCACGACCCTCTGGACCTCGTCGACGA 



-pHH14-3 



5 G P V P S A K GQEERAFL KVDPELVVTVLGOLEQLL 

CTTCAGCCAGArGCTGGACCCAGAGTCCCAGAGAAAGAGGACAGTGCAGAATGTCCTGGATCTCCGGCAGAACCT GGAAGAGACCATGTCCAGCCTGCGA 
GAAGTCGGTCTACGACCTGGGTCTCAGGGTCTC-TTCTCCTGTCACGTCTTACAGGACCTAGAGGCCGTCfTGGACCTTCTCTGGTACAGGTCGGACGCT 



•pHHl4-3 



■QRF(1-573t:p;i = pLM7 ORF 



-full available ORr HU-Unc53/l s pLMl OR 



? SQHLOPESOSKRTVONVLOLRQMLEcTMSSLR 

'■' i ■ i i i ■ ■■■ i i - — . i * t . i . . i 

GG3 ~CCC AGGTGACTCACA3C TCCC TGG AG A "3 aCC TGC TACGaCAGCGATGATGCCAAZCC ACGCAGCGfuTCCAGCCTCTCCAACCGCrCGTCCCCTC 
CCC AG3GTCCAC TGAG 7GTCGAGGG ACC TC TAG 'G jACGATGC TGTCGCT AC T ACGG T TGGG TGCG 7 CGC AC AGGTCGG AG AG GT TGGCG AG CAGGGG AG 



-PHH14-3 



-QRF (' l»5?C-goj = pLM7 CRF 



3 o v r 



■! Jl available ORF HU-Unc52/l apLMi OR 

" C Y0SO0AN35SVS3L SMR3 5P 



'"^ ~'~ A - ^ A ^^GCCA 3^"CG AG r CCGCGGC " I C AGGC TGG TGACGCGCCC TC TG "GGG TGCGaGC TGCCGC TCGGAGGGG AC GCCCGCC TG GT AC AT 

-CAGTACCGCGA'ACCGGTCAGGTCACGCGC ::- ! G TCCGACC AC TGCGC GGGAGAC ACCCACCC 'CGACGGCGAGCC TCCCC TGCGGGCGGACCA TG Tm 



-PHH14-3 



. S a-.ailabi-j CRF HU-Unc53/: = pLM • C = 
: A C C a P j ; G G 



S E G r P a v Y 
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iic, Hu-UncS3/1 seq (1 >6013) Site a no .cuence 

GCACGGCGAACGGGCCC AC TAG TCCC AC ACC A TGCCCATGCQCAGCCCCAGCAAGCTCAGCC AT ATC fCCCGCC TGGAGC ToG rCGAATrcCTGGACTCC- 
C'j"GCCGCTTGCCCGGGrG4TGAGGGrGTGGTACGG3 7ACGCG7CGGGGTCGTTCGAGTCGG TATAGAGGGCGGACCrCGACCAGCTTAGGGACCTGAG;; ' " 

pHH14-3 

ORF j 1 -573&ft) s pLM'/ ORF ■ ~ 

fell available Oflr HU-UncS3/1 = pLMI OR 
HGERAHYSHTMPMRS PSKLSH I SRLELVESL03 



G A TG AGG TGG AC C TC A AGTCC GGC T AC A TGAGCG AC AG TGAC CTCATGGGCAAGACC A TG AC GGAGGATGATG AC A TC AC TACCGGCTGGGATGAAAGC A 

. , ! 1 1 \ ■ i ■ 1 1 ■ ' » - a-jo 

C7ACTCCACCTGGAGTTCASGCCGATGTAC TCGC TG TC ACTGGAGT ACCCG TTCTGG 7 AC TGCC TCCTACTAC TGTAG TGATGGCCGACCC TAC TTTCG" 



PHH14-3 



ORF (1-579ap) = pLM7 ORF 



f'jti available ORF HLMJncS3/1 = pLM1 OR — 

CcVOLKSGYMSDSOLMGKTMTEOODITTGVOES 



GC7CCATCAGTAGTGGACTCAGCGATGCCTCAGACAATCTCAGTTCAGAAGAATTCAATGCCAGCTCCTCACTCAACTCCCTCCCAAGTACTCCCACTGC 

, | I i ) ! - II- ■ III 1 O^O 

CGAGGTAGTCATCACCTGAGTCGCTACGGAGTCTGTTAGAGTCAAGTCTTCTTAAGTTACGGTCGAGGAGTGAGTTGAGGGAGGGnCATGAGGGTGACG 



pHH14-3 

^ -pCB212 — mm ^^^^^^^~ 

> 

ORF ■ 1 -STZtp) n oLM7 ORF 

:uil available ORF HU-Unc53/l ^ pLMI OR — 

35 I 5SGlS0AS:>«LSSEEFNA3SSLN3LPSTPTA 



TCGC AGGAACTCAACAA'AG 733 7aCGCACa:aC: AGAGAAGCGCTC AC TGGC AGAAAG TGGGCTGAGC TGG TTTAG TG AA 7C AGAGG AGAA AGC3 
aa3 A3CG7CC TTGAGrfG" TaTCACGAT 3CGTG"C "G AGTCTCTTCGCGAG TGACCG TCTTTCACCCGACTCGACCAAATCAC TTAGTC 7CC 7C77TCG3 



PHH14-3 



pCB212 

'^:i available Oflr HU-Unc53/1 = pLMI OR 

3RRN3T:v.-": 3EKR5UAESGLSVFSE3££VA 



C 1 " " AAAAAAC rGGAGTAC3ACAG73G*AGCC* GAi^i 'GGAACC7GGGAC f TC TAAGTGGCGGAGGGAGCGGCC7GAGAGC T3 7GA7GA7 7C A7CCAAG3 

i 1 1 » . 1- ' t 

■j'n'rrT fTGACCrCATGC "G TC ACC A7CGGA-" " ~ A3C T TGGaCCC 7GAAGA7T3 ACCGCC TCCC T3G3CGGACTC TCGAC AC TAC " AA 3 7AGG T7C z 



pHHl4-3 



PCS212 



— . ' -uxlatle ORF HU-Unc53/l = pLMI OR — 

• l £ * : s •: 5 - * v e ? z r s k ■ w p ? * 3 *■ •: : d c- : s > 



BNSDOCID: <WO 9824810A2_I_> 



WO 98/24810 ^ 7 PCT/EP97/06956 



Tuesday. 18 November 1997 10:33 r~f<\ *7 

tig Hu-Unc53/1 seq (1 >6013) Site ar sequence / P*£jeJ 

0 r 3GAGAAC TGAAAAAGCCCATCAGCC TcGGCCACCZ TGGTTCCC 'GAAGAAGGGCAAGACCCCACC TG TGGC fG fAAC T "CCCCCA TCAC i"CACA*Au** 

C -CC fCr TGACTTrTTCGGGTAG TCGGACCCGGTGGuACCAAGGGAC fTC F rCCCGTTCrGGGGTGGACACCGACATrGAAGGGGGTAG T"GAGrGTGTi"" ^ 



-pHH14-3 



-pC8212 



-full available OHF HU-Unc53/l = pLMl OR 



GG ELKKP I SL GHPGS L K K G KTPPVAVTSPITHTA 

CCAGAGTGCCCTCAAAGTCGCAGGCAAACCTGAGGGCAAAGCTACAGACAAGGGTAAGCTTGCAGTGAAGAAT ACTGGGCTCCAACGCTCCTCCTCTGAT 
GGTCrCACGGGAGTTTCAGCGTCCGrTTGGAC ^^C3TTTCGATG TCTGTTCCCATTCGAACGTCACTTCTTATGACCCGAGGTTGCGAGGAGGAGACT^ 



•PHH14-3 



-pCB212 



0 S 



-full available ORF HU-UncS3/1 a pLM1 OR 



L KV AGKPEGK A T D KGKLAVK NT GLQRSSSD 



GCTGGTCGGGACCGCCTGAGTGATGCTAAGAAGCCCCCC TCGGGCATTGC TCGCCCCTCCAC TTCGGGATCCTTTGGC TACAAGAAGCCTCCTCCTGCCA 
CGACCAGCCCTGGCGGACTCACTACGATTCTTCGGGGGGAGCCCGTAACGAGCGGGGAGGTGAAGCCCTAGGAAACCGATGTTCTrCGGAGGAGGACGG- ^ 



-pHH14-3 



-PCB212 



u.'l available ORF HU-Unc53/1 = pLM1 OR 



- G *C*13DAK K?=> SG IARPS T SGSFGYKKPPPA 

C A3GCAC AGCCAC TG TC ATGCAAAC TGGTGGTC AGCC ACTC TC AG C A AG A TC C AG A AG TCC TC AGGCATCCC TGTCAAGCC AGTAAATGGGCGCAAGAZ 
G-CC3TG rCGG7GACAGTACGTTToACCACCAA3"'C3GTGAGAGTCGTTC TAGGTCTTCAGGAGTCCGtAGGGACAGTTCGG TCATTTACCCGCGTTCT3 



■PHH14-3 



-pCB212 



T v M c T G 



-Jwil available CRF HU-UncSo/i spLMI OR 



A T L S K IOKSSGIPVKPVHSS 



--MrrAGArG-:TccAACAGr-cACAG:c^3- J A--::r3GC rcc tgg agcccgt tc taac a tccag taccgc agcc tgccccggzcagcc aag "c aag * 

-•-A A1TC * £C AAAC5 r 7G TC 4*G fC "CGG TC a^ 2 aCCGAGGaCC TCGGGCAAuA r TO rAGG TCATGuCG T"CGGACGCG3CCGGTCG3 ~ TC AG TTC- 



-pHH14-3 



-pCB212 



: - ' 3*3:labl* ORF HU-Unc53/i = pLMl OR 



_ ~ L> v - ; M - ; A z 5 ; r i " 0 A R 5 M I 0 Y * 3 I ^ R P A 



BNSDOCID: <WO 98248 10A2_I_> 



WO 98/24810 



# 
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Tuesday. 18 November 1997 10:33 
fig Hu-UncS3n seq (1 >6013) Site at iequenca 

rr TAfGAGCO TG AC CGGCGGGCGGGG T5SACC fCGCCC TG TGAGCAGC AGC AT TGACCCC A3 TC TCC TC A3CACC AAGC AGG3 AG3CC 77ACGCCT TCC A 
AGAfACTCGCAC TGGCCGCCCGCCCC ACCTGGAGCGG3ACAC TCGTCGTCG TAAC TGGGG rCAGAGGAGTCGTGGrTCGTCCCrcCGGAArGCGGAAGG" 



pHH14-3 — 

pCB212 — 

pC8210-14 - 

full available ORr HU-UncS3/1 = pLM1 OR 

} — r if v primer HU53rj«i L 
S M S V TGGRGGPRPVSSS I0PSLLSTK0GGLTP5 

GACTGAAGGAGCCTACCAAGGTAGCCAGTGGGCGGACCACTCCAGCCCCTGTCAATCAGACAGATCGGGAAAAGGAGAAGGCCAAAGCCAAGGCAGTGGC 
CTGAC nCCTCGGATGGTTCCATCGGTCACCCGCCTGGTGAGGTCGGGGACAGTTAGTCTGTCTAGCCCTTTTCCTCTTCCGGTTTCGGTTCCGTCACCa 



pHH14-3 



pCB21 2 

PCB210-14 - 

full available ORF HU-Unc53/1 - pLM1 OR - 

p L K E P TKVASGP T TP APVNGTDREKEK AKA< A V A 



CrrGGACTCAGACAACATCTCCTTGAAGAGTATTGGCTCCCCAGAAAGTACrCCCAAGAACCAAGCAAGCCACCCCACAGCCACCAAGCTGGCAGAGCTo 

, i | r I ) ■ I i'li ■ 1- ' 

GAACCTGAGTCTGTTGTAGAGGAACTTCTCATAACC3AGGGGTCTTTCATGAGGGTTCTTGGTTCGTTCGGTGGGGTGTCGGTGGTTCGACCGTCTCGA'J 



pHHl4-3 — — 

PCB210-14 

full available ORr HU-Unc53/i = pLMi OR 

L 0 30NI3L.<3:G5P'ESTPKMQASHPTATKLAEL 

CC ACC AACCCC TC TCAGGGCC AC A3CGAAGAGC "To **C AAACCACCC TCACTAGCC AATC T TGAC AAGG TCAAC T CC AACAG TC TGGA 7C ThCCATCA" 
GG'GGTTGGGGAGAGTCCCGGTG TCGCTTCTCGAAAC A3 TTTGGTGGGAGTSATCGGTTAGAAC TG'TCCAGT TGAGGT7G TCAGACC TaGATGCTAGTa 

pHH14-3 — _ 

pCB2lCM4 - 

: J\ a. arable CRF HU-Unc53/1 = plMt OR 

33rPL^ATAKB"vyP?SLAN , _j<vN3NStDLP3 



&3 °i 




BNSDOCID: <WO 982481 0A2_I_> 



• 



WO 98/24810 4 4 ft PCT/EP97/06956 
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Tuesday. 18 November 1997 10:33 t >4 J Page 5 

fie. .Hu-UncS3/1 seq (1 > 6013) Site, j Sequence u 

CCAO rGA T aCCACCC A JZC T7CAAAGG TCCCaGaTC TGC ATGC TAC AAGC IXAGCATC 733GGGCCC TC TCCC TTCCTGC " r: ACCCCC AG tccggca-:*j 
GGTCACT ArGGrGGGT ACGAAGrtTCCACGGTCTAGACGrACGATGTTCGAGTCGTAGACCCCCGGGAGAGGGAAGGACGAAGTG j'j'jG TCAGGCCGTGCi " ' ' 



pHH14-3 ■ « 

pCB210-14 

full available ORF HU-UncS3/i = pLMi OR 
SS0TTHASJCVP0LHATSSASGGPLP5CFTPSPAP 



CATCCTCAATATTAAC TCAGCCAGC TTC TCCC AoGGCCTGGAGCTAATGAGTGGTTTCAG TG rGCCAAAAGAGACCCGCATGTACCCCAAACTCTCAGGC 

■ i 1 . ■ 1 ■ 1 ' ' 1 ' i i ■ ■ . i 22C* 

GTAGGAG TT ATA ATTGAGTCGGTCG A AG AGGGTCCCGGACC TCGATTACTCACCAAAGTC AC ACGGTT TTC TCTGGGCGTACATGGGGTTTGAGAGTCCG 



■ pHHH-3 — 

pC8210-14 — 

full available Oflr HU-UncS3/1 = pLMi OR 

[ L N INSASrSQGLELMSGFSVPKETRMYPKLSG 



CTGCACAGGAGCATGGAGTCCCTCCAGATGCCAATGAGCCTCCCCAGTGCCTTCCCCAGCAGTACTCCCGTCCCCACCCCACCTGCTCCCCCTGCTGCTC 
GACGTGTCCTCGTACCTCAGGGAGGTCTACGGT7ACTCGGAGGGGTCACGGAAGGGGTCGTCATGAGGGCAGGGGTGGGGTGGACGAGGGGGACGACGAU 



— ■ pHH14-3 

pCB210-14 — 

fuil available ORF HU«Unc53/1 = pLMi OR — 
LM5SMSSLCHPMSLPSAFPSSTPVPTPPAPPAA 



CCACAGAAGAAGAG ACGGAAGA3CTGACTTGC-:"GGAA3CCCCAGAGCT3GGCAACTGGACAGTAATCaGCGGGATCGGAACACTCT7CCCAAGAAAG3 
GGrGTCTTCTTCTCT3CCTTC TCGAC TGAACC ACC TTCGGGGTC TCGACCCGTTGACC TG TC ATTAG TCGCCCTAGCCTTGTGAGAAGGuTTCT TT*ZZ 



pHH14-3 

^^^^^^^^ HH3b ^^^^^^^^^^^^^^^^ 

PCB210-14 1 

:uil available ORF HU-Unc53/l = pLMi OR 

pep;jd 9 "sy2^23H — ■ 

r £ E i T I i „ T V; C SPP AGQL03NCRD3N T L 3 < > 



t 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 PCT/EP97/069S6 



Tuesday. 18 November 1997 10:33 

fig Hu-Unc53/1 seq (1 >6013) Site arv .aquence Pagei 

GC rc AaGTACCAGCTTCAGrCCCAGGAGGAjACCAAGaAoAGGCGACATTCCCATACCArrGGTGGCCrGCCTGAArc CgA TGACCAG tcag AGC TGCC 7 
CGAGTCCATGGTCGAAGTCAGGG TCCTCCTC'GG TTCC rCTCCGCTGTAAGGGTATGGTAACCACCCGACGGACTTAGGCTAC TGGTCAGrC TCGA'GGA 



■PHH14-3 



- pHH3b 



- re/ pf incr HU33fv 1 



-full available ORr HLMJncS3/1 a pLMl OR 



LRYQLQSGEE 



K E R R H S H TIGGUPESDDQSELF 



"CT~CCCCCrGCACTTCCCATGTCTC TGAGTGC»AAG33CCAACTTACC AACATAGTGAGTCCCACTGCGGCCA CCACGCCAAGAA TCACCCGCTCCAACA 
AGAGGGGGACGTGAAGGGTACAGAGACTCACGT7TCCCGGTTGAATGGTTGTATCACTCAGGGTGACGCCGGTGGTGCGGTTCTTA 26 ° 



-pHH14-3 



-pHH3b 



•full available ORF HU-Unc53/l *p|_M1 OR 



$ p ? A L P M S I S A X Q Q L TN IV S P T A ATTPRITRSN 

GCArcCCCACCCACGAGGCGGCCTTCGAGCTG-ACAGCGGCTCCCAAATGGGGAGCACCCTGTCCCTGGCCGAGAGACCCAAGGGAA TGATTCGGTCAGG 
CGrAGGGGTGGGTGCTCCGCCGGAAGCTCGACA-GTCGCCGAGGGTTTACCCCTCGTGGGACAGGGACCGGCTCTCTGGGTTCCCTTACTAAGCCAGTcr " 7 °' : 



-pHHl4-3 



■pHH3b 



" 'ull available ORr HU-UncS3/1 a pLMl OR — 

3 I P THE A , Ar£L ' S G5Q M G S T L $ L A E R P K G M t R $ G 

Ar -'- T fC=QAGACCCCACGGACGATGTrCACG::-CA5r3CT3rccCTCGCCTCCAGTGCCTCCTC 

^GOAAGGCTCTGGGGTGCCTGCTACAAGTGC-AGTCACGACAGGGACCGGAGGTCACGGAGGAGGTGGATGAGGAGTCGACTCCTCTCCTACGTrA.jA 

4 



-pHHU-3 — -J 



* pHH3b 



ava;!atle ORr HU-Unc53/1 = pLMl OR 



: - R 0 P T Q C v H : 3 v l 3 L A 3 S A S 3 TYS3AEERM03 

'j^j'-AAA TCCGGAAGC TTC G'AGGoA^C TuGA- *I a *" IAG3AAAAAGTGGCCACC rTGACGTCTCAG C rTTC TGCCAATGC TAATCT"GG rGGCTGCT" 
C ■ ^.rrT^GGCCTTCGAAGCATCCCrrGACC — ::^A:::7C:TrrrrCACCGGTGGAACTGCAGAGTCGAAAGACGGTTACGATrAGACCACCGA:GAM 



- pHH3b 



— , : 3, 2. at!-- ORr H*J Lfnc53/1 = pLMl OR 
I 3 < •. = 3 f r < 



=• < V A T L r s o L 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 121 PCT/EP97/06956 



Tuesday. 18 November 1997 10:33 f l<| ^ p a 
fig Hu-UncS3/1 sag (1 > 6013) Site and Secuence ^ s 

T TGAGCA3AGCC TGGTGAATA TG ACATCCCGCC TGCGAC ACC TGGC AG AG ACGGC CG AGG AG AAGG AC AC TGAGC TGC TGGA X T TGCGAGAAACCA TAG* 

a ac tcgtctcggaccacttatac tgtagggcggacgc tg tggaccgtc tctgccggctcc tc ttcctgtgactcgacgacc taaacgctctttggtatc? 



-U2 OR? = pCSSSi ORF 



■ pHH3b 



-full available ORF HU-UncS3/1 = pLMl OR 



FEQSLVNMTSRLRHLAETAEEKDTELLDLRETID 

CTTTC TG AAGAAAAAGAACTCTGAGGCCCAGGC AG TC AT TC AGGGAGCCC TTAATGCCTC AGAAACCACACCCAAAGAAC TTCGG ATCAAG AGACAAAAC 
GAAAGACT7CTTTTTCTTGAGAC TCCGGGTCCGTC A3TAAGTCCCTCGGGAATTACGGAGTCTTTGGTGTGGGTTTCTTGAAGCCTAGT7CTCTGTTTT-3 



' U2 ORr = pC825i ORt- 



-pHH3b 



-full available ORF HU-UncS3/1 = pLMl OR 



FLKKKNSEAQAV IQGALNASETTPKELR I 3C R Q fl 
TCCTCAGATAGCATCTCAAGCCTCAACAGCATCACTAGCCATTCCAGCATCGGCAGCAGCAAGGATGCTGATGCGAAAAAGAAGAAAAAAAAGAGTTGGG 

' 1 1 ■ 1 1 1 1 1 ■ 1 ■ ■ 1 • 1 ' yzcK 

AGGAGTCTATCGTAGAGTTCGGAGTTGTCGTAGTGATCGGTAAGGTCGTAGCCGTCGTCGTTCCTACGACTACGCTTTTTCTTCTTTT-TTTCrCAACCC 



-U2 ORr « p€825t OR?- 



-pHH3b 



-full available ORF HU-Unc53/i = pLMt OR 



5 S 0 S ISSLNS ITSHSS IGSSK0ACAKKKKKK5V 

'CTATGA GC TTCGAAGTTCCTTCAAC AAAGCGTTC A37ATAAAAAAGGGGCCCAAGTC AGCTTCCTCATAC TCGGATATAGAGGAGAT7GC TAC ACCCGA 
AGATACTCGAAGCTTCAAGGAAGTTGTrTCGCAAGTCATATTTTTTCCCCGGGTTCAGTCGAAGGAGTATGAGCCTATATCTCCTCTAACGATGTGSGC- 



— U2 ORF = p022u i OR? 



-pHH3b 



-full available CR.= HU-Unc53/i = pLM1 OR 



Vf£L3SSF,N <A-5 IX<GPK5A5SYS0I£E I A T F 0 

■: *c rrcAGCCCCcrcATCccccAAACTACioCi*: tac ag -3 ac tgcttcaccctccatcaagtcctccaccttgtcctccgtgggzactgatgt: 

'.»« GhAG "CGGGGGAG TAGGGGG" " TG AfGTC j *AIC - A3ATG TC TC r G AC 3 AAGTGGG AGG f AG T TCAGGAGG TGGAACAGGAG3CACCC3 TGACTACk'J 



■Ui ORF = pC22 ; ii ORF 



-pHH3b 



■•u-i a. arable ORF H : J-Unc53/i = pLMl OR 



-?53P<_0-: 3 r £ " A 3 P S IK 3 3 T L S S V 3 TO 



BNSDOCID: <WO 9824810A2_I_> 



WO 98/24810 PCT/EP97/06956 
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Tuesday. 16 November 1997 10:33 4^ A 3 pil r 

Jjf '-t u-UncS3/1 seq (1 > 6013) Site ano Sequence -^ ** e • 

ACCl .3CCC7GC7CACCCAGCCCCCC AC AC 'AGGC 7G T7CCA7GCAAA rGAGGAGGAGGAGCC AGA0AAGAAGGAGG7A7C33AGC 7GC3C rCTGAG" 
7GGC7CCCGGGACGAGTGGG7CGGGGGG7G TGA7CC3ACAAGG7ACG777AC7CC7CC 7CCTCGG7C TC7rCT7CC7CCA7AGCC7CGAC3CGAGAC7«"^ 



-U2GRFapC6SS1 OFF 



* pHH3b 



■full available ORF HU-Unc53/1 = pLMI OR 



T E G PAHP APH TR L F H A N E E E S P £ KKEVSELRSE 

rATGGGAGAAGGAAATGAAGCTTACAGACATCCGCTTGGAGGCCCTCAACTCTGCCCACCAACTGGATCAG CTTCGGGAGACCATGCACAACATGCAGT* 
ATACCCTCTTCCTTrACTTCGAATGTCTGTAGGCGAACCTCCGGGAGTTGAGACGGGrGGTTGACCTAGTCGAAGCCCTC 7GG7ACG TG 77G 7ACGTC AA 



* US OftF - pCB25l OF.F 



- peptide B72627H 



•full available ORF HU-Unc53/1 = dLM1 OR 



U3 ORF a pLMS ORF — — 

LV EKSMK, LTQ,I3L£ALN S AHQLOQLRETMHSIMQL 

GGaGG 7GGACC7GC 7GAAACCAGAGAA7GACCGAC TG AAGGTAGCCCCAGGCCCCTCATCAGGCTC CAC TCCAGGGCAGG TCCC TGGATCATC 7CCAT TA 
CCTCCACCTGGACGACTTTCGTCTCTTACTGGC'GACrTCCATCGGGGTCCGGGGAGTAGTCCGAGGTGAGGTCCCGTCCAGGGAGCTAGTAGACGTAA* 



'*J2 ORF = pC825l GP.F 



- pHH3b 



-il available ORF HU>Unc53/l s pLMl OR 



-U3 ORF = pLMS ORF 



£ v Q t. L K A E .N Q = L K V A PGPSSGSTPGQVPG^SAL 

rcr7CCCCACGCCGC^CCCTAGGCCTG3:AC^:^^::AT^CCrTr:GCCCCAGTCTTGCAGACACAGACC^GTCACCCA7G3A7G^C^'^:A:-ACTTGT: 
AGAAGGGGrGCGGCGAGGGA7CCGGAC:3TGA:-GGG"AAGGAAG:CGGGGrCAGAACGrCTGTG7C T 3 G AC A G T G G G T AC C T AC CG 7 AG " a 7 G A AC c L* 



-U2 ORF = pC5£S! ORF 



■pHH3b 



a > a; lade ORF HlMJnc53/1 = oLM1 OR 



-U3 ORF = pLMS ORF 



j S p , P °- SLGLAL'-SFSPSLACr:-. SPMDG 



T C 



BNSDOC10: <WO 982481 0A2J_> 



WO 98/24810 



# 
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PCT/EP97/06956 



Tuesday. 18 November 1997 10:34 
fit, ' u-Unc53/1 seq (1 >6013) Site arc Sequence 

G r ~C AAAGGAGG AAG TGACCCTCCGGGT3GT3G"GAGGA TGCCCCCGC AGC ACATCA7C AAAGGGG AC 1" TGAAGCAGCAGGA A f 73 T MC TGGGC T37AG 
C AGG T TTCC TCC T TC ACTGGGAGGCCCACC ACC AC TC C TACGGGGGCG TCG TG TaGTaGT TTCCCC TGAAC TTCGTCG TCC X TaaGaaGQACCCGACATC 



U2 ORF = pCB£S1 ORF 



pHH3b 



full available ORF HU-UncS3/1 = pLMl OR 



U3 ORF = ptMS ORF — — — - — ^ZTZ^^Z: 

GPKEEVTLRVVtfRMPPOHIIKGOLKQQEFFLGCS 



C AAGGTC AGTGGAAAAGTTGACTGGAAGATGCTGGaTGAAGC TGTTTTCC AAGTGTTCAAGGACTATATTTCTAAAATGGACCCAGCCTCTACCCTGGGA 
G TTCC AG TCACCTTTTCAACTGACCTTCTACGACC TACT TCGACAAAAGGTTCACAAGTTCCTGATATAAAGATTTTACCrGGGTCGGAGATGGGACCCT 



U2 ORF = pCB25 ! ORF 



pHH3b 



full available ORF HU-Unc53;i s pLMl OR 



— U3 ORF = pLM5 ORF — 

KVSGKVDVKMLCEAVFQVFKDYISKMOPASTLG 



C rAAGCACTGAGTCCATCC ATGGCTACAGCA'C A jCC AC jTGAAACGAGTGTTGGATGCAGAGCCCCCCGAGArGCCTCCTTGCCGTCG AGGTGTC AATA 

1 — 1 ' ' ' 1 1 • ■ i ■ i | 

G AT TCGTGACTCAGGT AGGTACCGATGTCGT AG "CG3T3C AC TTTGCTC AC AACCTACG TC TCGGGGGGCTCTACGGAGGAACGGCAGC TCC AC AG TTAT 



U:? ORF - pCS2£l O't-.P 



pHH3b 



' U4 ORF = pCB201 ORF 



: JA available ORF HU-Unc53/1 = pLMl OR 



U3 ORF = pLM5 ORF 



' pHH15 

- S r E S I H G y 3 I S v K P v L 0 a E » > £ ft P P C R = G V n 



At 9 



Page 9 



-•J A " A TC AG TC "CCCTC AAAGGTCrGAAGGAji;i"3Z3 TCG ACAGCC TG3 TG T TCGAGAC 3CTGA "CCCC AAGCCGA TGA TGC A CC AC "-"A r AA jCC * 
TO'ATAG TCAGAGGGA3 TJ TCCAGAC "TCC " *" "ACGCAGC rGTCG3ACC ACAAGC " r:CGAC ta:3GG r TCGGC ta; "a:g::g "A'TCGGA 



pHH3b 



U4 ORF = pCS20l ORF 



a*3::a=»-i ORF HU-UncS3/1 = nLM * OR 



U3 ORF = pLMS ORF 



pHHl5 __ 

/ .; j _ v F E : l - ► P f v 'j - i :.• L 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 A „ A PCT/EP97/06956 
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Tuesday. 18 November 1997 10:34 T '4 • J fa , D 

fig < u-UncS3/1 seq (1 > 6013) Site arx, Sequence a * 9 

OZ' .Z rG-AGCACCGGCGCC TCGTCCrc fCGGGCCCC A 3 CG G CACGGGC A AG ACC T ACC rGACCAA r CGC T TGGCCGAG "AC Z I"GG TGGAGCGCTC TGG" 

G'jACGAC T rCGrGGCCGCGGAGCAGGAGAGCCCGCGGTCGCCGTGCCCGTTCTGGATGG4CTGGTr^GCGAACCGGCTCA7GGACCACC~CGC34GA , '"<:' : 



- U2 OR? a pCBSSI ORF 



■pHH3b 




-full available ORF HU-UncS3/1 a pLMl OR 



- U3 ORF = pLM5 ORF 



-pHHIS 



LLKHRRLVL5GP5GTGKTYLTNRLACYLVZRSG 



CGTGAGG rCACAGAGGGCA TCGTCAGCACC "CaaCaTGCACCaGCaGTCT i GCaauSa \ C I GC AAC TG"f a "re TTTCCAACCTAGCCAACCAGATAGACC 
GCACrcCAGTGrCTCCCGTAGCAGTCGTGGAAGTTGTACGTGGTCGTCAGAACGTTCCTAGACGTTGACATAGAAAGGTTGGATCGGTTGuTCTATCTGG 



-U2 ORF = pC62St ORr 



-pHH3b 



-U4 ORF = pCB201 ORF 



-full available ORF HU-UncS3/i = pUM1 OR 



-U3 ORF^pLMS ORF 



-pHHIS 



NMHQOSCKOLQLYUSNUAHQIO 



GGSAAAC AGGAA7TG3GGA fGTGCICC TGG'Ga'TCTaT TGGATGAC C TGACTGAAGCAGGC TCCArCAGTGAGTTGGTCAATGGGGCCCTCACCT GCaA 

: ; : r r tg tcc r - aacc : : r ac ac gg gg acc a : ' - aga t a acc tag tgg ac tcac t tc g tc cg agg r act c ac t ca acc ag t r acc ccggg ag tggacg r r 



-U2 ORF :n pOBlrSi ORr 



-pHHSb 



-U4 QRF = pCB201 ORF 



available ORF HU-Unc53/1 = pUMI OR 



U3 ORF - pLMS ORF - 

pHHIS - 

i 

1 i j : v D - ' - : - C D L 3 £ A G 5 3 E L V *J G A . r c ► 



BNSDOCID: <WO 982481 0A2_I_> 
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liq H u-Unc53/1 seq (1 > 6013) Sile • Sequence 

0'a7ca. A^ATGrcccrATATTATAGGrACCAc:AAr:AGCcrGTAAAAArGACACCCAACCArGGcrrGCAcrrG^C7rcA3GATor7GACC^Ti-c: 

C i r AG T A T 7 T AC AGGG A T A T aATaTCCATGGTGG 7 7AGTCGCACA TTTT T AC TGTGCG TTGG FACCG AACG TGAAC TCGAAG fCC TACAAC TGcAAjAGU 



pHH3b 
U4 ORF = pCBSOl ORF 



full available ORf HU-Unc53/1 = pLM1 OR 



U3 ORF = pLM5 ORF 



pHHIS 



pup-ide B72S2*H 

Y H K C P Y 1 IGTTNQPVKMTPN HGLHL5FR ML TF3 

AACAACGTGGAGCCAGCCAATGGCTTCC TGGTTCGTTAC CTGAGGAGGAAGCTGGTAGAGTCAGACaGCGACATCAATGCCAACAAGGAAGAGC tgcttz 
7TGTTGC ACCTCGG TCGGTTACCGAA GGACCAAGCAA TGGACTCCT CCT TCGACCATCTCAG TC TGTCGCTGTAGTTACGGTTGTTCCTfCTCGACGAAG 

U2 ORF*= pC825i OR? — 



pHH3b 
U4 ORF = pCB201 ORF 
iull available ORF HU-Unc53/t = pLMl OR 
U3 ORF = pLMS ORF 



_ pHH15 — 

\ N V £ P A M G F L V 3 Y L R R K L V E S 0 S 0 tMAN KE ELL 

j .» J "oC TC'j -C"G33 T AZCC AAGC To a'Ca TC C AC ACC T TCC 7TG AGAAGC ACAGC ACC TC ACaC TTCC TC »TCGGCCC TTGC T TC TTTC TGT" 
v" C AC3AGC TGACCC" T^GG'TCGACACCATAQ'aGAGG TGTGGAAGGAAC rCTTCGTGTCGTGGAGTC TGAAGGAG7AGCCGGGAACGAAGAAAGACft2 



U2 ORr :b pCS^S* ORr 



- pHH3b 

U4 ORF = pCB201 ORF 

available ORr H'J-UncSo/1 - pLMI OR 



U3 ORF = pLMS ORF 



pHHIS — 

ht" ._ekhs"3CFl : -3 p c 
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If age i' 
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f -'9 _ -U-Unc53/1 seq (1 >6Q13) Site an. acuence 



3 



age it. 



0 "G ~ZCC AT TGGCA7TGAGGAC r 7CCGGACC 7GG 7TC A T" TGA CC TG 7GGAACAAC fC 7ATCA 7TCCC fA": 7A:AGGA.iGGA3C: AAGGA FGG"ATiA • 
' : ^" A3 ' 3G rA [ A ' : .! C .? rA " CTCCTGAAg3CC ^QGACCAAG TAAC TGGAC AC C T T G T TG AG A TAG T AAGG G A 7 A3 A T G r C C 7 T C C T C GG T T CC 7 AC CC 7 A X ' T ' 



-U2 ORF = pC6i!S? ORF 



- pHH3b 



-U4 ORFspCBZOI ORF 



■full available ORF HU-UncS3/1 a pLM1 OR 



-U3 ORF = pLM5 ORF 



-pHHlS 



C ? t G I E D FRTVf I 0 L V N N S ( IPYLQEG 



^ 0 G [ } 



G 7CC ATGGAC* 



«AAGC TGCTTGGGAGGACCCAG ~GG AA iGuG i CCGGGaC AC AC TTCC C TGuCCAICAGCCCAACAAGACCAATCAAAGC TGfACCAC" 
CAGGTACCrGTCTTTCGACGAACCCTCCTGGGTCACCTTACCCAGGCCCTGTGTGAAGGGACCGGTAG7CGGGTTGTTCrGGrTAGTT-CGACATGG-G: 



- U2 ORF = pCB2$\ ORF 




• full available ORF HU-Unc53/1 = pLMl OR 



-U3 ORF = pLM5 ORF 



-PHH15 



:< a a 



E 0 P v 



V*DTL*w»S4QQD0SXLrH 



''JLCCCC ACCCACCoT 3GGCCCTCAC AGCAT7GCC ~Z ACC TCCCGAGGATAGGAC AG 7CAAAGACAGCACCCC A AGT7CTCTGGACTCAGATCC"CTG.c" 
- C *J G 3 GG T GGG "GG C ■ C C C GGG AG T G TC G T A AC 3 



-U2 ORF ::: pCBSS? ORr 



-pHH3b 



-U4 ORF = pCB201 Ofir 



J! available ORF HU-Unc53.'t = uLM! OR 



■U3 ORF = pLMS ORF 
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£2. Hu-UncS3/1 seq (1 > 601 3) Site a Sequence 



J*age 1$ 



QkjZCxtgc "oc 7gaaac ttcaagaagc tgccaac t~ac a r tgag rc *cc aga tcgagaaacC" 7cc r GQAcccc aacc * rz agscaacac r rr AA2cgrr~ 

CCGG rACGACGACrtTGAAGTTC 77CGACGQ773 A7G7AAC TCAGAGGTC 7AGC7C T 77GG T AGGACC 7GGGG 7 7GGAAG "CCGTTGTGAA * "TCC * ""^ ^ 

3. ~ 



■ U2 CRF s pC325l OSF 
-pHH3b 




-fu!i available ORF HU-Unc53/1 = pLMl CR 



■ U3 ORF = pLMS ORF 




-peptica 372G25K 



A NL L K ^ Q E ' A AN Y I £ S P PR ETlLDPNLQATL.GF 



GGCAArCACTG7 CACCCCCGGACAGCAGAACGC^GGCATCAGCTA7CrrAGCTCCTCCrCTCCCC7CTCCrCTTTCAGAGCACTGGCTC7CCAGCC CCAG 

CCGTTAG TGACAGTGGGGGCCTG tcgtcttgcgaccg tagtcgatagaatcgaggaggagaggggagaggagaaagtctcgtgaccgagaggtcggggt^ 



-pHH3b 



-pHHIS 



H ? R T A 



£ R - HQLS .LLLSPLLFQ 



S 7 G 5 P A P 



■ ^-^ijAGAAC ^0CA3GGAGGACGAGA7GAAAoAG3 AGG3ACAG37 TC 77G 



G7GCTGTACC77T5AGAAC7-CCTAGGAAGGAATGGrG GGGrGGCGTrTGG 
: ' :r - TT ^""^C7CC7C7AC^^ 



-pHH3b 



•J 3 E 



3 G 3 0 £ 3 



pHHIS 

r g s v c 



f £ N F L G R M G 



7 A F G 



T ^ AA **ACA777AC73GCCTt T*C ? AATGAC 7 7rG0G3AAAAGA~GATTC7GGGTC 777CCC T7QAC77C77G77rCAA77ACAA£;; 
^JCAACACGGG^r TG TG TAAA ^GACCGGA-AG AT fAC TGAAACCCC TT ^ rCTACfAAGACCCAGAAAGGGAAC TGAA3AACAAAG r rAA'G 77T: 



■ pHH3b 



L C ? L N 7 F 7 G I 



-pHHls 



L V G K 0 0 5 G 5 F 



L L V S [ r M 



r33G:7 7T.:73 33GAGGGG7TCAC-AAAiCi*: 



:ac tgcaccag rrccTAAATGAr rc:: 



iGCAACCC 'GAGAGAGACAG " 



7GAG'j 



•i:;ACCC3AAAGA:::C7CCCCAAG7C7777G:;:-"To7GACG7C3rCAAGGA7T7A:7^ 



• pHH3b 



4 



' : G T 7GG G AC 7 C 7 C 7C TG 7C i G i A ; AC TG C : 



-pHH15 
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tig Hu-UncS3/1 seq (1 >6Q13) Site and Sequence ' 

AGATC TGGGGGAGGCAGGAAGCTCCTCAGATTTTC TCACAGACCCTTCCC AATTCCaTCACC ACTGCCAACAACTCCTCCCCCAGAGATCTGGCTGGA'jC 
TCTAGACCCCCTCCGTCCTTCGAGGAGTCTAAAAGAGTGrCTGGGAAGGGrTAAGGTAGTGGTGACGGrTGTTGAGGAGGGGGTCTCTAGA-CGACCTr*: 

pHH15 — » 

E IVGRQEAPQ IFSQTLPNS I TTANNSSPROLAGA 

CCAGAAAAAGAAGCATGTGGTTTAAAAAATGTTTAAArCAATCTGTAAAAGGTAAAAATGAAAAAACAAAAACAAGCAAACAAACAAAAAACAATGGAAA 
GGTCTTTTTCTTCGTACACCAAATTTTTTACAAATTTAGTTAGACATTTTCCATTTTTACTTTTTTGTTTTTGTTCGTTTGTTTGTTTTTTGTTACCTTr 
aKKKHVV.KHFKSICKR.K.KNKNKQTNKKQy» 

AGATGAAGCTGGAGAGAGAGGAACCAGTTGCCAAGGTAGAGAGCTGCCCGCTCCTGCCCTCTGGATGACATAGGGGACATCAACAAGACGGCTGCCAACL' 
TCTACTTCGACCTCTCTCTCCTTGGTCAACGGrTCCATCTCTCGACGGGCGAGGACGGGAGACCTACTGTATCCCCTGTAGTTGTTCTGCCGACGGTTGG 
R SVRF.RNQLPR RAAftSCPLDD IGD I NKTAAM 

TGAGAAGTCACCAAACCACAAAAATAAC CTTACAGCCTTCAGGGAAAGACTACCAGCTCTGTCTTTCTACCCTCTAATTTAACAATGCACCGGAATTCA5 
ACTCTTCAGTGGTTTGGTGTTTTTATTGGAATGTCGGAAGTCCCTTTCTGATGGTCGAGACAGAAAGATGGGAGATTAAATTGTTACGTGGCCTTAAGTC 

Linker? - 

LR SHQTT K1T LQPSG K0 YQL CLSTL.FNNAPEF5 

CT7GGACTTAACC 

i 1 6013 

GAACCTGAATTGG 



2 



— linker? 
L 0 L T 
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GflflrTCCDCCC«BGflCCTCflCnwaGTCCRflCCCCC«CflCCTCGCCflflCTCGRCaCT«RrCfiGCCCCflTCCCflflCRCrcrrccWflCMflCCKTCTO 100 
NSGC£lTvSGSPRflGQlOSNGRORNTLPKKGiQ 
f A AT i# . M 

TBCCRGC TTCRGTCCCBG WGSPGPCCRPGGPGBGGCGBCB rTCCCBTflCCBT TGGTGGGCTGCC TGPfl TCCGfl ICRCCRG fCflGPCC fCCCTTC TCCCC 200 
V Q L Q S 0 C £ T < CRR h S HTIGGLPCSOOGSCLPSP 

CTGCftCTTCCCRTGTCTCTG«;GCaflPGGGCCflPCTTflCCPPCflrRGTGflGTCCCRCTGCGGCCnCCflCGCC«W«RrCBCCCuCrCW«PGCnTCCC 300 
PflLP nSLSAKGQL T N 1VSP TRRTTPRI f Q $ h S IP 

CPCCCflCCPGCCGGC: HC5SGC TG TRC-GCGGC TCCCPPP TGGGGBGCPCCC TG TC C C * GG CCG RG RG RCC CP BGGGR P TCP TTC ZZ rCBGGBfCC TTC *00 
rMEflflfCLYSGSQnGSTLSLBCRPlCGfljasGSf 

CGPGPCCCCPCGGPCGP:GrTC?CGGCrCPGTGCrGTCCCTGGCCrCCRGTGCCTCCTCCPCCrPCrCCrCPGCTGPGGPGRGGP7GCPPTCTGAGCRPP 500 
kup fddVM GSViSLRSSflSS rYssHEcanosco 

| nomotoqy aoO * ~ 

rCCGGARGCTTCGrPGGGPflCTGGPPTCPTCCCPGGPPnPRGrGGCCPCCTTGPCGTCTCPGCTTrCTGCCPPrGCrRPrCTGGrSCCrGCrrrTGPGCP GOO 

i a g i RftciESSQCK vftTi rsot.sRNBWLvapreo 

nprnctogy Moc» A 

GPGCCTGGrGPRTPTGRCaT:::^:c:GCGaCflCCTGGCRGRGPCGGCCGPGGPGPPGGflCPCTGPGCTGCTGGfiTnGCGPGPPPCCarPGRCTTTCTC 700 

f»omo»oqy ttcc* A 

flPGPPRPPGPRCTCTGPGGCrCPiGCPGTCSTTCPGGGftGCCCTrPBrGCCrCPGPflPCCRCPCCCBPfiGRRCTTCGCftTCRPGfiGacaBflPCTCCTCPG 8C0 
ClCXMSe33avtOGBLNBSCTrPKCLRIK4QNSS 

PTBGCBTCrCRPGCCTCPPCSCCaTCPCTPGCCRTTCCPGCRrCGGCPGCPGCPPGGRrGCTGPTGCGRPPBBGBPGflPBAPPPPGPGTrGGGTCTnrGB 900 

osissLNsrrsHss (gsskororxk kkk^svvvc 

1 ^orcogy oiort 9 

GCTTCGRPGTTCCTrCMCassCCSTTCSGrprflflpflRRGGGGCCCPPGTCRGCrTCCTCPrPCTCGGRTRTflGPGGPGPrTGCTPCRCCCGPCTCTTCB 1000 
LBSSFwcsrs tKKGPXSPSSVSO ICeiRTPOSS 



homoogy Dkx* a 



GCCCCCTCRrCCCCCPPPC"C3SCaTGG:TC:aCPGRGRCTGCTTCBCCCTCCBTCPPGrcCTCCBCCTTt;TCCTCCGTGGGCac?GPTGTCPCCGRGG IIOO 
PPSSPrw:«3STCTBSPSI<SSTLSSvG^0VTt 

GCCCTGCTCPCCCBGCCC:::=L-C*PGuC7GrrCCPTGCRPRrGPGGPGGPGGBGCCRGPGflPGBBGGBGGTBTCGGPGCTGCGC:CTGPGCTBTCGGR 1200 
GPRKPRP-TflLFKBHCEEEP CKKCVSClRSClVg 

[ rvamciogy qiqcw C 



GPPGGBnBTGPftGCnPC?G3CS"CGCT1GCaGGCCCTCPBCTCTGCCCflCCRflCTGGBTCPGCTTCGGGBGBCCBTGCPCPRC»rGCBGTTGGBGGrG 1300 
K C n K l T Q I 9 i ERLNSBmQLOQLRC TftHNflQtCV 

»nyno*ogy pkx» C 

GPCCrGCrGPB»GCPGPGSR'Ss::GaC-GPflGGrPGCCCCPGGCCCCTCRTCPGGCTCCRCTCCPGGGCPGGTCCCTGGPrC«?C'GCPrTRTCrrCCC l«CO 
OLL KflCsra. <VPPGPSSGS TPGOVPGSSPlSS 
■vemc*c^v &kx> C *y 

m^m^^^tm weak nomoogy m Murm *i ncr>i 

cacGccGcrcccrPGG::':::-:*;s:::5--ccrrcGGCCCC^GrcrTGC3GPcacPGficc:GrcPCCCBTccprcGCP r CAGrscrTGrGGTccflPP isoo 

*»RPSLGv i .*-«SfGPSLPOTOtSPnOGISTCGP* 

GGacGPBG:GPccc r c:-:: *::*;;•;- ;:p -^cccccgchCC jrcprcaflRGGCG^c rTGSRGCRGcnGGPflTTC'Tc; roc:: *g trgcpr&gtc isoo 

t E v r c » . , s-pPQriii^GOLKOOEfrLiCSCV 



JG^GGPflRPGr rcRCMc«:s':: " * is-ic tgtt nccnPG ncRRGGae rs-ar ttc tsspp :ggpcccpc:c *: *-;::MuCbctrbgcp i roo 

3 G k v 0 » » - . ; CAvfQvriQv IS^nQPtti' -. G I S 

"orroioqv etoc* C 

> : CHGrccp:ccfl f GiC'-:=;:-*:s;:: 3 c^*:ppRCGPG'G:»3GR(GCPCPGCc::::-:PGp:GCCTccr :cc" ■c:=;c"":*R r RflCRTRrc >aoo 

r£Sl.H5«s:$«tfKRvi.0RePPCnPPCaa;<«NtS 
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flCTCTCCCTCnBnCGTCTG5flCGSCflPflTCCCTCCflCflCCCTGCTCTTCCACPCCCrcn-CCCCflfiCCCCRTC»TCCPCCfiCTRC«rnfiGCCTCCTGCT^ 1900 
V S L * C(.«CCC VQSLVfCTLlPtPnnQMVIStLL 
I npr-ioogy goc* € • Qftfd nucfrcltda BO 



RRGCRCCGGCGCC TC5TCCTCTCi;:ccCCP[lCGGC«CCGCCfl»CflCCTftCCTGflCC«PTCGCTTCCCCGRCTRCCtCCTGGflCCCCTCTCCCCCTCBC<; 2000 

«HBBlVlS5PSGTG«TVLTii»L«gyt.vCgS6BC 
nomctogy aoe« E • p«ec nuci«a.Q« 80 

TCPCRGPGGGCPTCu*C a CC 5 CC " r ZSPCPTGCRCCRGCRG TCTTGCflRGu aT C r GCPP" 7 GTRTCrTTCCflRCCTRGCCRPCCPGRTRGflCCGGGRR:RC 2100 
VTCG |vS? f «nHQ0SC<3LQl V L SNLRHQ1QBC T 

nomcrogy otoc* E ■ pfC3 nuci« J-c< 80 

PGCflflTTGGCGarGTCCCCC ?CC 7GRT Tt TRT7GGRTGRCC TGRGTGflflGCPGGCTCCRTCflGTGflG rTGGTCBRTGGGGtCC TCflCCTGCABGTflTCAT 2200 
G IGOVPIV I IIOOISEBGS 1 S £ L VMGftt T C < V H 

homoieqy poet E • (Hid mjcf ot«J* 8Q 

RRPTGTCCC TPn*rrPTBGG TRCCaCCftflTCRGCCTGTRRRRflTGRCfiCCCRflCCRTGGC TTGCRCTTGRGCTTCflGGaTGnGflCCTTCTCCRRCRRCG 2300 

KCPVt lGTTMQPVgflTPMHGLHLSFPHLTfSNH 
nomctogy »oc* E ■ t»ta rmcuao* BO 

TGGBaCBGCCBATGGCTTCCrGGTTCSTTacCTGAGGflGGaj^ 2«00 
VCPaiGrt vBviRRiCLVtSOSO IMBNKCELLWVC 
nomoogy pac* 6 • preo tnjOiaxSm BO 

CGPC rGGGTflCCCPPGCTG*3CraTCfl TCTCCPCflCCTTCC nGPGBRGCRCSGCflCCTCBGRCTTCCTCSTCGGCCCTTGCTTCTTTCTGTCGTGTCCC 2500 
QVVPKLVVHCHTriCKHSTSOrLIGPCr'fLSCP 
romotoqy aoc* E • or*a nucuctid* BO 

RTTGGCflrTGflGGflCT:CCGG?CC:SGT7CR7TCaCCTGTGGABCaflCrCTflfCBrTCCCTRTCTflCflGGBBGGBGCCBflGGBTGCGBTB«BGGTCCflTG 2600 

IG I COfBTyr I 0 L VNWSI IPVLOEGRcQGIKVM 
ftomaogy aoc> E • p*tq nucteqica 90 

GRCBGBfiPCCT^CrTC^GAGGSCCC^GTGGfiBTGGGTCCGGGSCRCRCrTCCCTGGCCflTCflGCCCflRCRRGflCCRRTCfiRflGCTGrnCCflCCrGCCCCC 270O 
GOKRavC0gVgVVgQTLPVPSBO0OQSlttVMt.PP 
rtomgoqy qoc» £ • p*ca nucteqida BO — ™_ 



RCCCRCCGTGGGCCCTCflCRGCBr2C:rCSC:;cCCGflGGB:RGGflCRGTCaflRGBCflGCaCCCCaflGrTCTCTGGRCKRGflTCCTCTGRTGGCCflTG 2800 
P T V G P M S : *5 P PEDBTVCOSTPSSLOSOPinRn 



nomofogy tiock E • prea nuc:ecxtO« BO 


CTGC TG RRR C T 7X ?BG PR GC TCCC.^flC 


TPC P ; TGBGTCTCC PGR TC GR GRRRC CR TCC TGGPCCCCRRC C TTCRGGCRPCRC T T7RRGGGy*CGGCRRTC 2900 


LLKLOER*** 


v ifCPOfifTlLOPMLQRTlI 


.■Non*acgy Beck £ • p*«d nuci«ottce 80 / 


RCTGTCRCCCCC5G3CPGCRGSflCSCT 


GGCRrCPGC TRTCTTPCCTCC TCC T C TCCCC TC TCCTC T7TCRGRGCRC TGGC TCTCCRGCCCCRGGRGGRGR 3000 


y irvamtttta trailer 


RC PGGPGGGPGGPGCPGP rc-*R j?CG 


SGGGPCPGGrTCTTGG^GCTGTRCCTTTGPGPRCTTCC Tfl GG R RG GR R f GG T GC GG TGG CG T T TGG GR RC TTG 3100 


3* -xiffansiateo trailer 


tgccccctrprcpcrt , !-:pc 'zz:z *: 


:-:^3rGPCrrTGCCGARflflGRTGPTTCTGGG-CnTCC:TTGRCTTCnGrTTCRRTTRCRPPCTCCTGGG 3200 


3' -ntramiaieo trailer 


CTrTCrGCGGPSCSCTrC3G=P"C" 


:3Sce;qcTGCRGC-GTICCTflflfl;GPTTCTCPC3PGCPPCCCTGPGRGPGPCPGTCrTGTGRGGGPGRTCTG 3300 


3' xitransiaiea tracer 


GGGGPGGCRGCSPGC TCC TCP:* " 


"CPCaGPCCCTTrCCPRTrcCRTCPCCPCTGCCPBCRPCrCCTCCCCCRGSGaTCTGGCTGGRGCCCRGRRfl 3<*00 


3" .rtrarsuted tracer 


prgpbgcrtgt:c"" t -^ = p! : s' j - ** 


:s3*;sarcTGTPPPPGG:SPPflfll^PPRPRPCPPPPflCPPGCSPflCPRflCaflPPR«CPRTGGPS«RGRTGRfl 3500 


2 -rtraniiatec :*a.ier 


gc :GCPGPGPG-:-:^c:=Gr-:::ii 


::—;i:=gc recces: *ccrGcc:*:*!;GarG=CP"PGs:cPcarcppCPp , :PCGGCTcccppcc tgpgbbg 3600 


3 -rtraniu:*c :*a.'Cf 


TcPc:ppocc ; »c--sB4:apc: - 


•*;s.:CGaRaCaC-=CCPGCIC*G*CrTrc*aCCC TC tap TfTPRC bp *GcaccGGPPTrcaGCFrGGRC 3700 


3 -p</arsta!e4 :-**••«» 



fTPPCC 3?06 
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figS4pLMi { » > 8285) Site and Sequence Pa 3" 
Enzymes : 72 of 146 enzymes (Filtered) 

. Settings : Circular, Certain Sites Only, Standard Genetic Code 

GfGGCACT TTTCGGGGAAATGrGCGCGGAACCCCTATrTSTrTATTTTTCTAAArACArTCAAATATGrArCCG CrCA rGAGACAATAACCCrGATAAAr " 

CACCGrGAAAAGCCCCrTTACACGCGCCTrGGGGATAAACAAATAAAAAGATTTArGrAAG7rTATACATAGGCGAGTACTCrGTTArT3GuACTArTTA ^ 
GGTFRGNVRGTPICLFF. IHSNMYPLMRO.P. . M 



G'.rTCAATAATATTGAAAAAGCAAGASTATGAG rATTCAACATTTCCGTGTCGCCCTTATTCCCTTr rTTGCGGCATTrrGCCrTCCrGTrrTTGCTCAC 

C-jAAGTTArTATAACTTTTTCCTTCTCATACTCArAAGrTGTAAAGGCACASCGGGAATAAG-JGAAAAAACGCCGrAAAACGGAAGGACAAAAACGAGTG ^ 
L Q ■ y ■ K R K S M S 1 Q H F R V A L I P F F A A F C L P V F AH 

CCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGSTGCACGAGTCGGTTACATCGAACTGGATC TCAACAGCGGrAAGATCCrTGAGAGTT 
GGTCTTTGCGACCACTTTCATTTTCrACCACTTCTAGTCAACCCACGTGCTCACCCAATGTASCTTGACCrAGAGrTGTCGCCArTCrAGGAACTCTCAA 300 
PET LV lCVK OA eOQ LGARVGY lEL OLNS G K I L E $ 

TrCGCCCCgAAGAACGrTTTCCAATGATGAGCACrTTTAAAGrTCTGCTATGTGGCGCGG TATTATCCCGrATTGACGCCGGGCAAGAGCAACTCGGTCG 
AAGCGGGGCTTCTTGCAAAAGGTTACTACTCGTGAAAATTTCAAGACGATACACCGCGCCATAArAGGGCArAACTGCGGCCCGTTCrcSTrGAGCCAGC ^ 
F R P E £ R F P n M j T F K V L LCG AVLSR I OAGOEQLGR 



CCGCArACACTArTCTCAGAArGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCArGACAGT>UGAGA ATTA TGCAGTGCTGCC 
GGC G TA TG TG A T AAGAG TC T T AC T G AAC C AAC T C A TG AG TG G 7 CAG T G TC 7 T T TC GT AG AA TGCC T ACCG TAC T G TCA T TC TC f TAA T AC G TCACGACGG 
n ' H . v S QNDCVEYSPVTEKMLrOGHTVRELCSAA 



ArAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGArCGGASGACCGAAGGAGCTAACCGCrTTTTTGCACAA CATGGGGGATCArGTAA 
rArTGGTACrCACTATTGTGACGCCGGTTGAATGAAGACTGTTGCTAGCCTCCTGGCTTCCTCGATTGGCGAAAAAACGTGTTGTACCCCCrAGTACATT 6 °° 
1 T H 5 0 N T AANLLLTT IGGPKELTAFLHNMGOHV 



CTCGCCTTGATCGrTGGGAACCGGAGCTGAATGAAGCCArACCAAACGACGAGCGTGACACCACGATGCCTG rAGCAArGGCAACAACGTTGCGCAAACT 
GAt;CGGAACrAGCAACCCTTGGCCTCGACrTACrrCGGTATGGTTT5CTGCTCGCACTGTGGrGCTACGGACATCGTTACCGTTGTTGCAACGCGTTTGA 
r R L D R W £ P E i, N E A | P M 0 £ R D TTMPVAHATTLRKL 



ArTAACTGGCGAACTACrTACTCTAGCTTCCCGGCAACAArTAATAGACTGGATGGAGGCGGArAAAGTTGCAGGAC CAC TTCTGCGC TCGGCCCTTCCG 
rAATTGACCGCTTGATGAATGAGATCGAAGGGCC3TT3rTAArTArcrCACCTACCTCC2CCrATTTCAACGrccrGGrGAAGACGCGAGCCGGGAAGGC 8 °° 
L tGELLTLA SROQL ! DVHE A9KVAGPLLRSALP 



3CrjGCTCgrTTArTGCTGATAAATCTGGAGCCGGTGAGCG:GGGTCTCGCGGTATCA7TGClGCACrGGGGCCAGATG G TAAGCCC TCCC5 TATCG TAG 
".' \»A'.. Z'jACCAAA IAACGACTATTTaG-CC TC33CCAC 7C3C A"C A3 AGCGCCATAGTAAO"CGTGACCCCGGTCrACCATrCGGGAGGGCATAGCA*C 
' 1VF|A Q<SGAG£PGSRG llAALGPOGKPS^IV 

^-^"G^CGGG-jAGTCAGGCAAC 74TGGA7GA ACGAAATA 3ACAGATCCC TGaGaTaIGTGCCTCAC TCAT7A AGCATTGG TAAC 7 j TC AGACC A 
aa, *a-;a 70 7JCTGCCCC rCAGTCCGTTjA'ACC TAC * 7GC7 TT a" 73 7C f-CCGAC TC Ta 7;CACGGAGrGACTAA7 7C5TAACCAT T3a! AGTCTGGT ^ 



f T r Q S Q ATHOtRN^QlAEIGAStlJtHV 



L S 0 Q 



A.::TrACKATArArAcrrTAGATrGAr7rAAAAcr:cAT:r:rAA- TTAAAAGGATcrAGcrGAAGArccrTTTrGA'AAr cTCArGAccAAAArcccT 
rvAA a tga'3 rATATATGAAArc taac taaa ftt tgaa3Taa aaat "aaattttcc tagatccac ttctaggaaaaac Tat tagagtac TG'ir tttaggga tI0 ° 

Y 5 T 1 L • I D L K L H F F K R I . V K I L F 0 N L M T < IP 

Taac jr3A3rTTrCGTrcCACTGAGC3TCAGAC;CCG7AgAAAAGA'CAAA3GArC7rCT7G A3A7CCTTTTTTTCTGC3CGTAATCTGCTGCTrGCAAA 
ir-JCACT.:AAAAGCAAGGrGACTCGCAG:C733GGCA:C7rr7C-A37TTCCrAGAAGAAC7CTAGGAAAAAAAGACGC3CArUGAC3A:3AACGTTT 12 °° 
q£F SFH.ASOPV £< I KG5S.0PFFURVICCLO 

■. AAA AAAACCACC 3C T ACCAGCGG7GGT77GT 7 7GCC3 jA':aa3 A 3CT AC CAAC TC T T T 7 7 ; CCA AGG 7 AAC TGGC T TC AGC AGAQC 5C A 3 A 7 ACC AA A 

i: ::7rT7^?3GCGA7GG7CGCCACCAAACAAACG5::TA37rC?;3AT3G7rGA3AAA-A:3CTrcCArT5ACCGAAGrCGKTCGC3::TATGGTTT 13 °° 

r<tlCPP< - P AV VCLPQ0lL P7L fP<vrG F 3 R A : ( p N 

rA::rr jr ccr:crA-3rGr4ccc3TA.3TTAG^^ 

a. ^ - A, iGAAjATCACA7c33CA:i:4A:c:33-;;73AA3":r-3A ji'iarc-irGGC -j3a : ; u r :uagc 3A3ACGA v-^-xca;*-^;; - ; ^.ccgacga ' " 1C ° 

_ _ L L 7 • ? 0 h - - < N 5 A » "> : r L A L L I L L ^ »* A A 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 



135 



PCT/EP97/06956 



Tuesday. 16 November 1997 t3:S7 Page V 
fio S4 DIM I ( 1 > 6265) Site and Sequence m 

GCvAG TCGCGaTAAGTCGTC TCTTACCGGGTTGGACTCAAJACGATAGTTACCCGAf AAGGCGCAGCGGTCGGGCrjAACGGGGGGT'T'CGTGCACACAGC 

CGOrCACCGCTATTCAGCACAGAATGGCCCAACCTGAGrTCTGCTATCAATGGCC TATYCCGCGTCGCCAGCCCGACTYGCCCCCCAAGCACGTGrOTCG 

asgokscltgld s a r . l p o k a q a s g . t g g s c to 

CCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGC3TGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCC 

i — ■ ■ ' 1 ' 1 » 1300 

GiTCGAACCrCGCTTGCTGGATGTGGCTTGACTCTATGGATGrCGCACTCGATAC TCTTrCGCGGTGCGAAGGGCTrCCCTCTTTCCGCC 76 TCCATAGG 

P 5 L E R r T Y T E L R Y L 0 R S L . S 5 A T L P E G R K APR y p 

GGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTrCGCCACCTCTGACTT 

, 1 1 1 1 ' ■ * 1700 

CCATTCGCCGTCCCAGCCTTGTCCTCTCGCGTGCTCCCrCGAAGGTCCCCCTTTGCGGACCArAGAAATATCAGGACAGCCCAAACCGGTGGAGACrGAA 

VSGRVGTGeftT R E L P G G N A V Y L Y S P V G F R H U . L 

GAGCGrCGATTTrTGTGATGCTCGTCAGGGGGGCGGAGCCTArGGAAAAACGCCAGCAACGCGGCCTrTTTACGGrTCCrGGCCTTTTGCTGGCCTTTTG 

■ > ' ■ ' ■ ' ■ ' ' > 1800 

CTCGCAGC rAAAAACACrACGAGCAGTCCCCCCCCCTCGGATACCrTTTTGCGGrCGTTGCGCCGGAAAAATGCCAAGGACCGGAAAACGACCGGAAAAC 

E R R F L CSSGGftSLVKNASNAAFLRFLAFCVPF 

CrCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCG 

' 1 ' 1 ■ ' ' « 1900 

GAGTGTAC AAGAAAGGACGCAATAGGGGACTAAGACACCTATTGGCATAATGGCGGAAACTCACTCGACTaTGGCGAGCGGCGTCGGCTTGCTGGCTCGC 

AHMFFPALSP 0 S V 0 N R I T A F £ . A D T A R R S R T T £ ft 

CAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCArTAATGCAGCTGGCACGACAGGTTT 

i ■ * ' 1 ■ - 1 ■ ' ► 2000 

GrCGCrCAGrCACTCGCTCCTTCGCCTTCrCGCGGGTTATGCGTTTCGCGGAGAGGGGCGCGCAACCGGCTAAGTAATTACGTCGACCGTGCTGTCCAAA 

OESVSEEAEERP IRKPPLPARVPIH . CSVHQRF 

CCCG4CTGGAAAGCCGGCAGTGAGCGCAACGCAATTAAT3:3AGT7AGC rCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGC TCGTATGT 

, 1 • 1 ■ ' 21C0 

CiiGC rGACCTTTCGCCCGTCACTCGCGTTGCGTrAATrACACrCAArCCAGTGAGTAArCCGrGGGGrcCGAAATGTGAAATACGAAGGCCGAGCArACA 

POVKAGSEftNAlNVS L T H . A P 0 A L M F M L P A R M 

rGTGrGGAArTGTGAGCGGArAACAATTTCACACAGGAAACAGCTATGACCATGATTACGCCAAGCGCGCAATTAACCCrCACrAAAGGGAACAAAAGCT 

, 1 ' » ' -i ' ■ i 1 ' ■ 1- 2200 

AC AC ACCTrAACACTCGCCTArTGTrAAAGrGTGTCCrrTGTZGA-ACTGGTACTAATGCGGrTCGCGCGTTAATrGGCAGTGATTTCCCTTGTTrTCGA 

LCGIV5G.QFHTGN5YOHOYAKRAINPH.RE0KL 

G-iriT^CCGGGCCCCCCCTCGAGGTCGACGjTAyCGATAAjC-TGATATCGAATTCCTGCAGCCCCTGCTCrrCACCCAGArGCrGGACCCAGAGTCCCAG 

, 1 1 ' ' ' • i 2300 

CCiArGGCCCGGGGGGGAGCTCCAGCTGCCATAGCTAr-C^AACTA-AGCTTAAGGACGTCGGGGACGAGAAGTCGGTC rACCACCTGGGTCTCAGGGTC 



-insert pLMl 



I ORrpLMf 



<jTGP?LEv0G10< '-D:EFLQPLLFS3MLOP-S0 

AGAAAGAGGACAGTiC AGAA7GTCCTGGArCTCCGGCAGAA"TG lAAGAGACCATGTCCAGCCTGCGAGGGrCCCAGGrGAC TCACAGCTCCC TGGAGA 

1 , 1 1 ^ ■ i ■ l 2«C0 

rCTTrCTCCTGTCACGrCTTACAGGACCTAGAGGCCGr-TTSGACC'TCTCTGGrACAGGTCGGACGCTCCCAGGGTCCAC TGAG TG TCG AGGG ACC TC T 



-insert pLMl 



-ORF pLMl 



3<RTvONVL0LR 5NL£ETMSSLRGSOvrHS3L E 

T'jACCTGCTACGACAGCGATGATGCCAACCCACGCAGCjTSTCAKCrCTCCAACCGC TCGTCCCCrCTyrCATGGCCCTATSGCCAGTCCAGTCCGCG 

■ 1 i ■ i 1 1 1 ■ ■ ■ i ■ * 25CO 

A'TG jACGATGCrCiTCGCTACrACGGTTG jGTGCGTC jCACiiGTrsaAGAGGTTGGCGAGCAGGGGAGACAGTACCGC-lArACCGG 'CAGG TCAGGCGC 



-insert pLMl 



— OaFpl.Mt 

CYOSO0ANP4S'JS5L SNRSSPLSV^rGOSSPR 



BNSDOCID: <W0 982481 0A2_I_> 



# 



WO 98/24810 136 PCT/EP97/06956 



Tuesday. 18 November 1997 13:57 

fig S4 pCM I (1 > 8285) Site and Sequence 



-insert piMl 



-ORF pLMl 



Lq* GPA PSVGGSC^3£ GT P AtfYM HGEftAHYSHT 

ArOCCCAT3CGCAGCCCCAGCA^GCTCAQCCATArCTCC:::CT2GAGCrGGTCGAATCCCTGGACTCGG ArGAGGrGGACC TC AAGTCCGGCTACAfGA 
TACGGGT^CGCGrCGGGGTCGTTCGAGrCGGTATAGAJG:c:GACC7CGACCAGCrTAGGGACCTGAGCCTACTCCACCrGGAGTTCAGGCCGATGTACT 27 °° 



- insert pLMl 



-OHFplMt 



MP M R S P S K L 5 H IS^LELVESLOSOEVOLKSGYM 

GCoAC AGTGACC rCATGGGC AAGACCATGACGG AGGA 73A* 3ACA "C AC TACCGGCTGGGA T*G AAAGCAGC TCCA rc AG TAG TG GAC 7C AGC GA TGCCTC 
CGCTGTCACTGGAGrACCCGTrCTGGTACTGCCrcCTACTACTGTAGTGArGGCCGACCCTACTTTCG TCGAGGTAGTCATCACCTGAGTCGCTACGGAG M °° 



- insert pLMl 



-OfiFpLMl 



S 0 S P t M G K T M T £ 0 C 0 I X T GV0ESS5ISSGLS0AS 

AGACAATCTCAGrTCAGAAGAATTCAATGCCAGCTCCTCAC TC AA C TCC CTCCC A AG T ACT CCCAC TGCTTCTCGCAGGAACTCAACAATAG TGCTACGC 
TC TGTTAGAGTCAAGTCrTCrTAAGTTACGGTCGAGGAGT3A3TTSAGGGAGGGrTCArGA6GGTGACGAAGAGCGTCCrTGAGTTGrTArCACCArGCG 



2900 



- insert pCM I 



-ORF pLMl 



O N L S S £ E F N A SSS.NSLPS TPTASRRNS T I V L ft 

AC AG AC TC AG AG AA GC GC TC ACTGGCAGAAAGTGGGC TQA3C3C "TAGTCAATC AC AGG AGAAAGCCCCT AA AA AACTGGAGT ACS AC AGTGGTAGCC 
TGTcrGAGrCTCTTCGCGAG TGACCGTC T7TCACCCGAC7" -CC AAATCACTTAGTCTCCTCTTTCGGGGATTTTTTGACCTCATGCTGTCACCATCGG 



- insert pLMl 



-Oft? pLMl 



: :)5 -< ^sla;5Gl;w-3£S££kapkkl e Y 0 S G S 

rjAAQATG-3AACCrGGGACTrCTAAGTGCCGGAGGGA A3 AGC roTGATGATTCATCCAAGGGrGGA GAAC TGAAAAAGCCCATC AGCCTGGG 

ac r:cTAccT7GGACCcrGAAGATTCAccGccTcc:T :::::: ac r;rcGACAcrACTAAGTAGGTrcccAccTcrrGAcrTTTTCGGGTAG rcGGAccc 3,00 



-insert pLMl 



-OWF pLMl 



L < M Z P G T S * V R a. £ = a Z S C 0 0 S SKGGELXKP ( S L G 

* ' "** 1 * 1 ■ '■ * — — ■ ■ ,i, 

C.-i';cCTG:rrcCCTSAAGAAGGGCAAGACCCCACC*:"::-3 'AACTTCCCCCArCACTCACACAGCCCAGAGrGCCCrCAAAGTCGCAGGCAAACCT 

" ' 1 * 11 i ■ ■ i - ■ i ■ t . t . i r t 2** r O 

•■■.i ■ j-j^accaaqggac rrcT TccccrrcrsGGG tgga :-c: :a:a - ~GAAGGGGGTAGrcAG7GTGrcGGcrcTCACGGCAGTr rcAGcgrccGTTTGGA 



•insert pLMl 



- ORF pLMl 



H ? G S L * < G * r ? » 7 1 ■« 7 5 P 1 T H 7 A Q S A L < V A G K P 
:Ar J G^CAAAGCTACAGACAASGGTAAGCT:3CA-;7GAA3.i.:'A:-::3C TCCAACGC TCC TCC TCTGATGCTGG TC GGGACCGCCTGAGrGATGCTAAGA 

"^•""'H7:GATGrcTGr7cccA7rc^^ 3X0 

"""* " ■ ■ .nsert pLVh _ 

' " CHFuLMl ■ 

J ' K * T 3 < 5 < . a v < ' : . 0 A S o S 0 a G 9 c a t s 3 A K 



Page 5 



GC rGCAG GCTGC TGACGCGCCC TCTGTGCGTCCGAGC "ZZZ 3-TCSGAGGGGACGCCCGCCT5GTACATGCACGGCGAACGGGCCCAC TACTCCCACACC 

■ ■ t . i— t. i ■ ■ i ■ - ... ii i i ■ i i i t - i , , i 2&OQ 

CGACGTCCGACCAC TGCGCGGGAGACACCC ACCCTCGAC3GC3AGCCTCCCCTGCGGGCGGACCATGTACGTGCCGCTTGCCCGGG TGATGAGGCrGfGC 



8NSDOCID: <WO 982481 0A2J_> 



WO 98/24810 1 3? PCT/EP97/06956 



Tuesday. 18 November 1997 13:57 p ^ 

No pCM 1 f 1 > 8285) Sile and Sequence . . 

AaCCCCCCTCCCCCATrGCTCCCCCCrcCACTTC5CG*::cr:C3aCTACAAGAAaCCTCCTCCTGCCACACGCACACCCACTCrCArsCAAACTGGrCG 

■ ■ 1 ' ■ 1 * — — — » 1 — i i i i i 3t|00 

rCGGGGGGAGCCCGTAACGAGCGGGGAGGrGAAGCCCrAGGAAGCCGATGrTCTTCGGAGGAGGACGG TG fC CG TG TC G G TG AC AG TACG T f TGACC AC C 



-insert pLMl 



-OAF pLMl 



<?PSGIARPSrS55FayKKPPPATGTATVM0rGG 

r rcAGCCAcrcTCAGCAAGA tccagaag tcctcaggcatccctgtcaag ccagtaaatgggcgcaagactagcttagatg TTTCCAACAGCGCAGAGCCA 

' ' — 1 1 1 > 1 ■ ■ » 1 - » 1 ' > ■ ■ » - ■ > ■ ..... ... t i . i i i I,, . i i , , i 3500 

AAGTCGGrGAGAGrCGrTCTAGGTCTTCAGGAGTCC5rA2GGACAGTTCGGTCArTTACCCGCGTTCTGArCGAArCTACAAAGGTTGTCGCGTCrCGGT 



-insert pLMl 



-ORF pLMl 



S A r L S K I O K S 5 G I ? V K PVNGRKTSlOVS N S A £ P 

GuATTCCTGG CTCCTGGAGCCCGTTCTAACATCCAGTACCGCAGCCTGCCCCGGCCAGCCAAGTCAAGTTCrATGAGCGrGACCGGCGGGCGGGGTGGAC 

— • 1 ■ * ' * 1 1 1 i i i i ( ■ i ■ i i i , i , — 4. 3gQo 

CCTAAGGACCGAGGACCTCGGGCAAGATTGrAGGTCATScCGrCGCACGGGGCCGGTCGGTTCAGTTCAAGATACTCGCACTGGCCGCCCGCCCCACCTG 



- insert pLMt 



-OflF pLMt 



G F L A P 0 A W S N I Q T 3 S L P ft 9 AKSSSHSVTGGRGG 

CrCGCCCTGr GAGCAGCAGCArTGACCCCAGTCTCCTCACCACCAAGCAGGGAGGCCTTACGCCTTCCAGACTGAAGGAGCCTACCAAGGTAGCCAGTGG 

1 - 1 ■ — 1 ■ 1 ' * ■ * ... i 1 t , x 3700 

GAGCGGGACACTCGTCGrCGrAACTGGGGTCAGAGGAGr*3:GGTTCGTCCCTCCGGAATGCGGAAGGTCTGACTTCCTCGGArGGTTCCArCGGTCACC 



-insert pLMl 



-ORF pLMl 



PftPvS SS10P 5LL5T<Q GGL TPSRLKCPTKVASG 

GCGGACCACrCCAGCCCCTGrCAArCAGACAGATCGGGAAAASSAJAAGGCCAAAGCCAAGGCAGTGGCC TTGGAC FCAGACAACATC TCCT TGAAGAGT 
CJCCrGGTGAGGrCGGGGACAGrTAGrcrGrCTAGC::"r"::r;:TCCGGTTTCGGTTCCgTCACCGGAACCTGAGTCTGTTGTAGAGGAACTrCTCA 



-insert pLMl 



- ORF pLMl 



C T "^APvNOrpa;< £ xakakavalqson islics 

-o:C7CCCCa.;aGAGTA C rcCCAACAACCAAGCAAGCZirZCCACACCCACCAAGCTGCCAGAGCTGCCACCAACCCCTCTCAGGGCCACAGCGAAGA 

' ' * ' i 1 ■ 11 ■ ■ ■ ' * ■ » ' t 1 1 ■ 1 — ■> — > 1 ■ » 39CO 

* "^vCGAGu jgTCrCTCATGAGGGTrCT73GTTCuTT* ju" j 0^3 " jTCGGTGGTTCGACCG7CTCGACGGrCGTTGGGuAGAG TCCCGGTj TCGCTTCT 



-insert pLMl 



-GflFplMl 



: ^ 33 £Sr?KNQAS^?rA TKLAELPPTPLRATAK 

j. •"G7CAAA-cAcccrcACTAGCCAA;:T7jACAA:j-:iA:-::AACAGTcrGGArcTACCArcATccAGTG A uccacccatgc ncAAAGGrccc 

•: i- A ACAGrrTG:rGGGAGTGArcGGr'A;AAcrG7:::A:-'SA;;rTG7CAGACCTAGArGGrAGTAGGrcAcrATGGTGGGTACGAAGTrTCCAGGG tlCC ° 



- insert pLM \ 



-OflF pLMl 



S f v 9 ■ ? ? S L A ft L 0 <#N5MSLOLPSSS0TTHAS<VP 

"^'irGCArscrACAAGcrcAGCArcTG^GGc:;:: :":7:cACCCccAGKCGGCACCCATccrcAA urTAACTCAGCCASCTrcTcc 

* *- j4CJ r ACOA7G7 TCGA jrCGTAGACCCCCjGGA 2« ^aa; jitjAAGTGGGGG 7CAG jCCGfoGGf AgGAGTrAfAAf rGAGTCGG'ZSAAGAGG 



-•nsert pLMl 



SHFpLMf 

:~r?S?A>lL*i|NSA;F 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 13 g PC17EP97/06956 



Tuesday. 18 November 1997 13:57 p C 

fig 54 plM 1 ( 1 > 82BS) Site and Sequence . ^ 

CAGCGCCTGGAGCrAArGAGrGGTTrCACrGTCCCAAAAGAGACrCGCATGTACCCCAAACTCTCAGGCC rGCACAGGAGCATGGAGrcCCrcCAGATGC 

1 1 ' 1 ' 1 ■ 1 1 ' 1 * ■ ■ i i ■ i ■ i a 200 

GTCCCGGACCTCGATTACTCACCAAAGTCACACGGTTTrCTCTGGGCGTACATGGGGTTTGAGAGTCCGGACGTGrCCTCGTACCTCAGGGAGGTCTACG 



-insert pLMl 



-ORF pLMl 



0 -3 L E L H S G F S V ? < £ T P M V P K L S G L H R S H E S L 0 M 
CAATGAGCCrCCCCAGrGCCTTCCCCAGCAGTACTCCCGTCCCCACCCCACCTGCTCCCCC rGCTGCTCCCACACAAGAAGAGACGGAAGAGCrGACTTG 

i i -- . ■ i i J ■ ■ - ■ » — ■ ■ -»..■■-*■■■ - — — . . i . ■ » | . . .. , . , . | t t , I, M | | | , q-jqq 

GrTACrCGGAGGGGTCACGGAAGGGGTCGTCATGAGGGCAGGGGTGGGGrGGACGAGGGGGACGACGAGGGTGTCrrCTrCTCTGCCrTCrCGACTGAAC 



- insert pCMl 



-OflFpLMl 



? M S L P S A fpsstpvptppappaapteeeteeltv 

GAGTGGAAGCCCCAGAGCTGGGCAACTGGACAGrAA'CAGCGGGATCGGAACACrCTTCC CAAGAAAGGGC TCAGG TACCAGCTTCAG TCCC AGGAGGAG 

' ' 1 *""* * * " 1 — «■ ■ " ■ t > * i . ■ ■ i . » qtioO 

CrCACCTTCGGGGTCTC GACCCGTTGACCTGTCAr'AvSrCSCCCTAGCCTTGTGAGAAGGGTrCTTTCCCGAGTCCATGGTCGAAGTCAGGGTCCrCCTC 



-insert pLM1 



-ORF pLMl 



SGS PB A GQ LDSNO R DR N T L PKKGLRYQLQ5QEE 

ACCAAGGAGAGGCGACArTCCCATACCATTGGTGGGCTGCCTGAATCCGATGACCAGTCAGAGCTGCCTTCTCCCC CTGCACrrCCCATGTCTCTGAGTG 
TGGTTCCTCrCCGCTCTAAGGGTATGGTAACCACCCGACGGACTTAuGCTACTGGTCAGTCTCGACGGAAGAGGGGGACGTGAAGGGrACAGAGACTCAC 



-insert pCMl 



-ORF pLM1 



KERRM5HT [ Q Q I ? Z SOOQSELPSPPALPMSLS 



CAAAGGGCCAACrrACCAACArAGTGAGrCCCACrGCaGCCACCACSCCAAGAATCACCCGCTCCAACAGCA TCCCCACCCACGAGGCGGCCTTCGAGCT 
G ~ 'TCCCCG f TG AATGG T TGTA TCAC TCAG5G7GACGCC3G 'GG'^CGGTTCTTAGTGGGCGAGGTTGrCGrAGGGGTGGGTGCTCCGCCGGAAGCTCGA 



-insert pLMl 



-ORF pLMl 



* < G QL TN I V 3 P T A A T T P R I T R S NS I PTrtEAAFEL 

a:ACAGCGGCrCCCAAArGGjGAGCACCCrGTCCCTG3::GAGACA:CCAAGGGAATGATrCGGTCAGGArcCTTCCG AGACCCCACGGA.; 5 A IG TTCAC 
'.'Atii rCGCCG^C-jGrTTACCCCrCGrGCGACAGGGACC 13C *ZTC "SGGTrCCCfTACTAAGCCAGTCCTAGGAAGGC "C TGGCGTGCCTGC TACAAGTG "'^ 



- insert pLMl 



3RF pLMl 



" 3 G S Q M G S T L S L A £ ? J K C M I RSGSFQOPTOOVH 

>j- T CAGTGC TG TCCC rGGCC rCCAGTGCC TCCTCCA" Ta; rCC "lACCrGAGGAGAGGATCCAA TCIGAGCAAArCCSSAAGCTTCGTAGGGAACTGG 
C : U A 3 TCAC3AC AuGG ACCSOAGG TC ACGG AGG AGG' 3 3 A' ; AGG a G fCGACKC TC TCCT ACGTT AG AC fCGT TTAGCCCTTCGAAGCATCCCrTGACC 



- .nsert pLM 1 



-ORF pLMl 



^^VLSLASSASST 'SSA EER.10SEQ I 1 < L Z s £ l 
- "C ATCCC AljGAAAAAG fG GCCaC C T T GACGT"CA 1"T "TG 1 1 AATGCTAATCTGGTGGCTGCrTTTGAGCAGAGCC T"G G TGAATAT 3ACATCCCG 

*q-aggg::: r r t r rc accgg tog aac r gcagag* i z j "acga i*t agaccaccgacgaaaac tcg rc ggacc ac r * a r r gtagggc 

" " — — — — _ ^_ _ n$ert pLMl . 

■ IRFpLMl — 

' _ 3 " - ^ v a r , : j *. ; i -| A h L v A A F e : , ( v. : - ^ 



BNSDOCID: <WO 9824810A2J_> 



WO 98/24810 1 3g PCT/EP97/06956 



Tuesday. 18 November 1997 13:57 
(iqS4plM1 (1 > 82&S) Site and Sequence 



CCTGCGACACCTGGCAGACACGGCCGAGGAGAAGGACACri^CTGCTGGArTTGCGAGAAACCATAGACTrTCTCAACAAJlAAGAAC rCTGAGGCCCAG 

. . » . 1 , . i ■ • 5000 

GGACGCTG TGGACCGTCrCTGCCGGCTCCTCTTCCTGrGACrC3ACGACCTAAACGCTCTTTGGTArCTGAAAGACTTCrrTTTCTTGAGACrCCGGGTC 



-insert pLM! 



-OAF pLMl 



LRHLA£rAeeKOT£LLOLRETt0FLKKKNS£A0 

GCAG TCAT TC AGGG^GCCCT TAATGCCTC AGAAACCACACC IaaaGAACTTCGGATCAAGAGACAAAACTCC TCAGATAGCATC TCAAGCCrCAACAGCA 
CGTCAGTAAGTCCCTCGGGAATTACGGAGTCTTTGGTGTGGjrTTCTTGAAGCCrAGTTCTCrGTTTTGAGGAGTCTATCGTAGAGTrCGGAGTTGrCGT 5l °° 



- insert pLM 1 



-ORF ptMt 



^ V [QGALNASETT?<ELR 1KPQNSS0S1 SSLNS 



rCAC fAGC C A TTCC AGC A TCGGCAGC AGC AAGG ATGC T5AT j;3AAAAAGAAGAAAAAAAAGAGTTGGGTC rATGAGCTTCGAAGTTCCTTCAACAAAGC 

i , i i 1 1 ' 1 *- 5200 

AUTGATCGGTAAGGTCGTAGCCGTCGTCGrTCCTACGACTA:3CTTTTTCTTCTTTTTrTTCTCAACCCAGATACTCGAAGCrrCAAGGAAGTTGrTTCG 



-insert pLMI 



-ORFpLMl 



I T S M S SJGSSKOAOAKKKKKKSVVrELftSSFNKA 

GrTCAGTATAAAAAAGGGGCCCAAGTCAGCTTCCTCATACTCSGATATAGAGGAGATTGCTACACCCGACTCTTCAGCCCCCTCATCCCCCAAACrACAG 

1 ■ ' 1 ' 1 ' 5300 

CAAGJCATATTTTrTCCCCGGGTTCAGTCGAAGGAGTATGAjCtTATATC TXCTC TAACGATGTGGGCTGAGAAGTCGGGGGAGTAGGGGGTTTGArGTC 



- insert pLM 1 



-ORF pLMI 



F S I K K G P K S A S S Y 3 0 t E £ I ATPOSSAPSSPKLO 

CATGGTTCCACAGAGACTGCTrCACCCTCCATCAAGTCCrcZiCCT-GTCCTCCGTGGGCACrGATGTCACCGAGGGCCCTGCTCACCCAGCCCCCCACA 

' i.i, , h fiqoO 

jrACCAAGGrGTCTC75ACGAAoTGGGAGjTASTTCAGGA:"33AACAGGAGGCACCCGTGACTACAGTGGCTCCCGGGACGAGTGGGTCGGGGGGTGr 



- insert pLM 1 



-ORFpLMl 



H GSTETASPS: <33"LSSVGTOVTEGPAHPAPH 

:"-olC TGrTCCArGCAAAT.jAGGAGGAGGiCCrAGAuAAjii^SAGGTATCGGAGCTGCGCrCTGAGCTArGGGAGAAGGAAATGAAGCTTACAGACAT 

■ 1 5500 

j»"'."CGaC AAGG TACG rrTACT"CC rcC TCC rCGG "CTC TIC **~" ~ZCA TaGCC ^CGACGCGAGAC TCGA TACCCTCTTCCT rTACTFCGAA TG TC TGTA 



- insert pLM 1 



-ORFpLMl 



r * L F H A.N££ £ £ ? r k < -1 y $ E L RSELVEKEMKLTOI 

1 TGGAGGCCC TC 4ACTC "3CCCACC AAC " j jATCACC "T 2 ^lACACCATGCACAACArGCAGT TGGAGG USGACC TGCrGAAAGCAGAGAAfGAC 

1 ~ • 1 ' 1 ■ i ■ ■ i i 5600 

J^^;rCCGaGAG"3A0AC3'jGr3GrT;j:CTA3::3i-:;::rc TCGTACGTGfTGTACGTCAACC rCCACCTGGACGACTTrCGTCTCTTACTG 



■ .nserl pLMI 



-0F1FULMI 



"I E ALNSAM0.C5- i £rMM.NMQL£'J0l.LKAENO 

■: 'jiAy-j f-i"iC::;AGGCcccTCATCAa:::ccA:::;;;:::iaGrccc tggatc atctgcajt atcttccccacgccgc tccctaggtctggcac 

— — — , . 1 ' ■ — 1- 5700 

... . ■ '. . a:q jGC jrcCjGoGAGrAGT;C3A33 r 3A3*j I* tCAGCGACC TAG fAQACGfAA fA jAAGCGGTGCGGCGAGGGAfCC 3GACCGTG 



; i p g z "i a 



Pagr 



BNSDOCID: <WO 982481 0A2 I > 



WO 98/24810 140 PCT/EP97/06956 



Tuesday, 18 November 199713:57 

fig 54 ptM i ( i > 6265) Site and Sequent 



TC A ^CCArTCCTTCGGCCCCAGTCTrGCAGACACAGACCI3riACCCArGGATCGCArCAG rACTTGTSG TC C A AAGO A-.1G a AG TG AC CC rCCGGGfGGf 
AGTGGGTAAGGAAGCCGGGGTCAGAACGTC TGTGTCTGGac AGTGOGTACC TACCGTAGTCArCAACACCAGGTTTCCTCCTTCAC TGCCAGGCCCACCA 



5600 



-ms©rt pLMl 



-OflF pLMl 



L r M S F G ? S L A Q T Q t 5 ? M 0 Q I S T C G P K £ £ y T L ft v v 

GG T G AGGA TGCC CCC G C A GC AC A tc A TC AAACGGG AC T T 5 A AG C A 3C AGG A A T TC T TC C TGG G C TG TAGC AACG TC AG TGGA AA AC T TCAC TGG AAG A TG 
CCACTCCTACGGGGGCGTCGTGTAGTAGTTTCCCCTGAACrTCGTCGTCCrTAAGAAGGACCCGACATCGTrCCAGTCACCTTrTCAACTGACCTTCTAC 



5900 



-insert pLMl 



-OflF pLMl 



VftM PpQ HI I K G 0 L < Q OEFFLGCSKVSGKVOVKH 



CTGGAfGAAGCTGTrTrCCAAGTGTTCAAGGAC TATA T7 T"* AAA TGGACCCAGCCTC TACCCTGGGAC rAAGCACTGAQTCCATCCATCGCTACACCA 




6C00 



LOEAVFQVFK O Y I SKHDPASTLGlST 



S I H G Y S 



rCAGCCACGTGAAACGAGTGTTGGArGCAGAGCCCCCCGAGATGCCrCCTrGCCGTCGAGGTGTCAATAACATATCAGTCrcCCTCA^AGGTCT GAAGGA 
AU TCGGTOCACT TTGCTCACAACCTACGTC ^^GGGGGGCTC "acGGAGGAACGGCAGCTCCACAGTTATTGTATAGTCAG AGGG AGTTTCCAGACrrCCT 

-insert pLMl 



6100 




Z M V * * V L 0 * E P ? E « P P C R ft G V N H t S V S L K G L K £ 
GAaa TGCGTCGACAGCCTGu ^Q^^CGAGACGCTGA7Cr-"C*-^CC jATGATGCAGCACrACArAAGCCfCCTGCTGAAGCACCGGCGCCTCGTCCTCTCG 




CVQSL vFE "I 1 3 < p *t m Q H V | S L L < H p , L y L 5 

:C jAGTACCTGG fGGAGCGC TCTCCCC 3 ■ jAGGTCAC AGA 1GGC A TCG TCAGCACC T 



yj::^:CAGCGGCACGGGCAAGACCTACC TG ACC A A XC2C t 



-:nsert pLMl 



-OAF pLMl 



9 5 ° F S ' T y f * * - * S * L V E a 3 6 3 £ y T j r M t y S T 

•^ : *^*CCAGCAGr:7::;^^ 




' J M H0-3SCKDLJ 



NLA NO I D ? E T 



I G 0 v ? L V 



;A-*;r A ; r j3 Ar3AC ,- r - A .. 7gaa ;cao:c" : a:c a j :- 1 ; 



••:;::AATSG3GCCCTCA:crGCAA3-A-:ArAAAri'.:;::ArAr7ArAGGTAcc 




Paget 



BNSDOCID: <WO 982481 0A2_|_> 



WO 98/24810 141 PCT/EP97/06956 



Tuesday. 18 November 1997 13:57 p 
liq 5J pLMl {1 > 82851 Site and Sequence „ * 

ACC AArCAGC CTGrAAAAATGACACCCAACCATGGCTrsCACTT3A3CTTCAGGATGTTGACCTTCrCCAACMCGTCGAjCCAGCCAATCGCTTCCTCg 

' ■ *■ — ■■" « ■ ■ ' ' ■ ■ ■ « ■ ■ — 66CC 

r<i^TrAGTCGGACATTrTTACrGTGGGTTGGTACCGAAC37GAAC TCGAAGTCC fAC AAC TGG AAG AGGT TGTTCCACC TCGGTCGG T TACCGAAGGACC 



-insert pLMI 



-OflFpLMl 



T ?l O P V K M T P M H G L * L S F ft MLTFSNNVEPANGFL 

rTCGTTACCTGAGGAGGAAGCTGGTAGAGrCAGACAGCSACArCAArGCCAACAAGGAAGAGC TGCrrCGGGTGCTCGACTGGGTACCCAAGCTGTGGTA 
AAGCAATGGACTCCTCCTTCGACCArCTCAGTCrG7CGCTG7AGTrACGGTTGTTCCTTCTCGACGAAGCCCACGAGCTGACCCATGGGTTCGACACCAT 67 °° 



-insert pLMI 



-ORF pLMI 



V * V L * ft * LVESOSC I NANtCEELLRVLOVVPKLVY 

TCA7CTCCACACCrTCCTrGAGAAGCACAGCACCrCA3ACTTCCTCATCGGCCCTTGCTTCTTTCTGTCGTGTCCCATTGGCA TTGAGGACTTCCGGACC 
AUTAGAGGTGTGGAAGGAAC TCTTCGrGTCGTGGAGTCTGAAGGAGrACCCCCCAACGAAGAAAGACACC ACAGGGTAACCGTAACTCCTGAAGGCCTGG 



68CO 



- insert pLMI 



-OflFpCMt 



H L H f F L E K H S T S 0 F L 1 G P C F F L S C P 1 G I £ 0 F ft T 

TGGTTCArTGACCrGTGGAACAACTCTATCATTCCCTATCTACAGGAAGGAGCCAAGGATGGGATA AAGGTCCATGGACAGAAAGCTGCTTGGGAGGACC 
ACCAAGTAACTGGACACCTTGTTGAGATAGTAAGGGATAGATGTCCrTCCTCGGTTCCTACCCTATTTCCAGGTACCTGrCTTTCGACGAACCCTCCTGG 



6900 



-insert pLMI 



-ORF pLMI 



y F t O t- V N N S 1 1 P V L 3 S G A K OG IKVMGQKAAVEO 

CAoTGGAATGGGrCCGGGACACACTTCCCrGGCCATCAS:::AACAAGACCAArCAAAGCTGrACCAC CTGCCCCCACCCACCGTGGGCCCrCACAGCAT 
G T A -*C T T ACCC AGGCCC TG TG TG AAGGG ACCGG TAG "S3:rrGTrCTGGTTAGrTTCGACATGGTGGACGGGGGTGGGTGGCACCCGGGAGTGrCGTA 



7000 



-insert pLMI 



-OhF pLMI 



g VEVtfRDTLP VP5AC00QSKL VHLPPPTVGPHS I 

r^-rCACCrcCCGAGGATA6GACAG TCAAAGACAGCA::::iAG-rcTCrGGAC TCAGATCCTCTGATGGCCATGCTGCTGAAACTTCAAGAAGCTGCC 

1 ' * 1 * ■ - " ■ * 1 ■ ■ . - ■ ■ ■ ■ i ■ ■ » ■ , i . , . t i I . | 7 1 CO 

AVUuA jrGGAGGGCTCCrATCCTGrCAGTrrCTG-CGr:3G"r:AA3AGACCTGAGTCTAGGAGAC TACCGGTACGACG AC XT TGAAGTTC TTCGACGG 



- insert pLM 1 



-OOF pL.VM 



A SPPEOftTVKO S T SLOSOPLMAMLL<LOEAA 

r ac^t r-jAG rc tccaga tcgagaa acc atcctggaccc '~±z z r tca jGCaacac r r taagggttcggcaatcac tg tcacccccggacagcagaac 

jA. ^,AAC7C AGAGGrCTAGCTCTTT>::r-GGAC: 7ZZZV T3:aaG rcCSTTGTGAAA~rcCCAAGCCGT TAG TGaC AG TGGGGGCC TG TCG1X TTG 



-insert pLMI 



-OfiF pLMI 1 



'*- |£s ? P» £riLD?%.3A TL.CFGNHCHPPrAE 

j-.' "'j iCATCAGC TAf C TTAGCTC CTCCTC 'ZZZZXCTZZ'Z "~Z ~3"3C A"TGGCTC7CCaGCCCCAGCaGCAGAACA5GA«j-j3AGGAGGA 3ATGAAAG 

. 7" * — * ■ ■ ■ — i .. ■ i i i ■ — . ... I, ,. , i , | 73i;n 

. - jrCGArACAArCGAGGAGGACAGGG^AGAj j~Z±-~ ~T T :"3ACCGAQAGGrcGGGG TCC TCC TC T IZ T-ZZ TCCZ TZZ TCC rCTAC TTTC 



L L L » » L L ) i'CSPAP GOEQ-I'IGGDEft 



BNSDOCID: <W0 982481 0A2J_> 



WO 98/24810 PCT/EP97/06956 

142 



Tuesday. 18 November 1997 13:57 

tig $4 pLM ^ (1 > 6285) Silo and Sequence Page 1 

A'j^jAGGGACACG7rC77GG7GCTG7ACC77TGAGAAC T7CC7A5GAAGGAA7GGrGGGG rGGCG77rGG GAAC77GTGCCCCC7AAACACArT7AC7GGC 

rccrcccTGrccAAGAAccACGACATGGAAACTcrTGA^GGArccrTcc Waccaccccaccgcaaacccttgaacacgggggatttgtgtaaatgaccg 7a °° 



-insert pLMl 



'i G T G S W C C T F t H r L G ft M G GVAFGNLCPLN 7 F T G 

C7:crCTAArGACrTTGGGGAAAAGATGATTCTGGGT:rTTCCCTrGACTTCTTGrTTCAATTACAAACTCCTGGGCTTrCTGG GGAGGGGrTCAGAAAA 
GAGGAGATrACTGAAACCCCrTTTCTAC 7AAGACCCA5AAAGGGAAC70AAGAACAAAG77AA 7G777GAGGACCCGAAAGACCCC7CCCCAAG7CTTT7 



+ 7500 



-insert pLMl 



*- - .U VGlC OD S GS fP .LLVS tTNSV A F W G G V 0 K 

C A TC AA A AC AC TGCAGCAGTTCCCCGG A AfTCAGCTT 3 3 AC rTAACCAGGCTGAACTTGCTCAAAAG A AGCCGA AT TCCAGC AC AC TG GCGGCCGtrACT 
37AGrTTTGrGACGrCG7CAAGGGGCCTTAAGTCGAA-::TGAATTGGTCCGACrrGAACGAGrTTTCTTCGGCTTAAGGTCGTGTGACCGCCGGCAATGA 7S °° 



-insert pLM1 



7700 



TSKHC SS SPE FSLOL TftLNLLKRSR IPAHVftPLL 

AGTTCrAGAGCGGCCGCCACCGCGGTGGAGCTCCAATTCGCCCTATAGTGAGTCGTATTACGCGCGCTCACrGGCCGTCGTTTTACAACGTC GTGACTGG 
rCAAGATC TCGCCGGCGGrGGCGCCACCTCGAGGTTAAGCGGGAT ATCACTCAGCATAATGCGCGCGAGTGACCGGCAGCAAAATGTTGCAGCACT6ACC 

V L E B P p p R y S S N S P y S E S Y Y A ft S L AVVLOftftOV 

GAAAACCCrGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCG ATCGCCCTTCCCAAC 
CTTrrGGGACCGCAATGGGTrGAATrAGCGGAACGTC5r3TAGGGGGAAAGCGGTCGACCGCATTATCGCrTCTCC6GGCGTGGCTAGCGGGAAGGGTTG ^ 
£iy|PGV rQL><R '-* A ri > PFASV RWS E£ARr Oft P S Q 

AG7TGCGCAGCCrGAArGGCGAATGGGACGCGCCCTG7*SC3GCGCATTAAGCGCGGCGGGTGTGGTGGTrACGCGCAGCGTGACCGC TACACTTGCCAG 
^^GCGrcGGACTTACCG^^ 7900 

0 L R S L NG gV 0 A P C 5 6 ALS A A G VVV T RSVTA TLAS 

CGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCrrrz rCSCCACG77CGCCGGC77TCCCCG7CAAGcrC7AAA7CGGCCGC TCCC77TAGGG77C 

>:oGjATCGCGGGCGAGGAAA-3CGAAAGAAGGGAAGGAAA3A3CoGTGCAAGCGGCCGAAAGGGGCAGlTCGAGArTTAGCCCCCGAGGGAAArCCCAAG 
ALAP A Pf A rrPS--LAT FAGFP RQA L NRGLPLGF 

C a-*" ^^ A ^^Y^TTTACGGCACCTCGACCCC AAAAAAC T " 3^77^3 jGTGATGGTrCACGTAGrGGGCCATCGCCCTGATAGACGGTTTTTCGCCCrTTGA 
^r iAAT CACGAAArGCCGTGGAGCrGGGGTTTTTTG^:rAArcCCACTACCAAGTGCATCACCCGGTAGCGGGACTArCTGCCAAAAAGCGGGAAACr 8I °° 
^FS ALRMLOPICKLS . SOCSR SGP S P . . T V F ft P L 

" ** ^CCA CG rrCTTTAA rAGTGGAC rC77G7TCI*AAC t3GAACAACACfCAACCC7A7C7CGG7C7A77C T77TGA77TA7AAGCGArrT7GCC 

GCAACCrCAGGTGCAAGAAArrArCACCTGAGAACAA33T:rGACr T TGTTCTGAGTTGGGArAGAGCCAGArAAGAAAACTAAATATTCCCrA W CGG ^ 
r L E 5 T F F N 5 3 L L f 0 7 S r T L N P I S V Y S F 0 L . G I L P 

i,rrrCGGCCTATrGGrrAAAAAA7GA0CTaA7rTAA:^AAAA"TAACGCGAAT77 TAACAAAArAT7AACGCrTACAArr7AG 

;'-AAGCCG3ATAACCAA777r7TACrCGAC7AAA":-"T:^AriGCGC7rAAAA77G7T77ArAAr7G03AAT3rTAAA7C 
t S A Y w ^ < n £ L I . ; < f N 4 N F N K j L T L r j 



BNSDOCID: <WO_9824810A2_I_> 




WO 98/24810 143 PCT/EP97/06956 



Tuesday. t8 November 1997 1 1 ;48 — ~ ~ Pa 

fig 34 pLM4 (1 > 10070) Site and Sequence (l)A 
Enzymes : 100 of 146 enzymes (Filtered) / ' ' V 

Settings: Linear. Certain Sites Only. Standard Genetic Code • 

TAG TTAf TAA TAG TAATCAATTACGGGGTCATTaGTTCATAGCCCA TATA TGGAGTTCCGCG TTACATAAC TTACGGTAAArGGCCCGCCT GGC TGACC3 

ATC AATAATTATCATTAGTTAATGCCCCAGTAATCAAGTATCGGCTATATACCTCAAGGCGC AATGTATTGAA TGCCATTTACCGGGCGGACCGAC TGGZ ^ 



-pCUV 



UUIVINYGVISS P I YGVPRYI TYGK 



W P A W L T 



^CCAACGACCCCCGCCCATTGACGTC AATAATGACGrATGTT TGGAGTATTTACGGT 
3GGTrGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCATTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTCATAAATGCCA 2 °° 



-pCMV 



A Q R P P P t 0 V N iNOVCSH SNANRDFPL t s m g g v f t v 

AAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACG TCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTA 
TrTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGGGGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCAT 



-pCMV 



NCPLGSTSSVSYAKYAPY.RO . R . MARLALCPV 



CATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGm^ 

GrACTGGAATACCCTGAAAGGATGAACCGTCATGTAGATGCATAATCAGTAGCGATAATGGTACCACTACGCCAAAACCGTCATGTAGTTACCCGCACCr 



H 0 L M G L S Y L A V HLR ISHRYYHGOAVLAVHQVAV 

TAGCGGTrTGACTCACGGGGATTTCCAA3TCrCCAC::CATTGA CGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAArGTCGTA 
A7CGCCAAACTGAGTGCCCCTAAAGGTTCAGAGGTGGGGTAACTGCAGTTACCCTCAAACAAAACCGTGGTTTTAGTTGCCCTGAAAGGTTTTACAGCAT 



' AV : t-TG ISK S?=> H . ROVEFVLAPKSTGLSKMS 

ACAACrcCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAG TGAACCG TC AGATCCGC TAGCGC TA 
rG T TGAGGCGGGGTAACTGCGTTTACCCGC CATC CGI ACATGCCACCCTCC AGATATATTCGTCTCGACCAAA TCACTTGGCAGTCTAGGCGATCGCGA7 

3 



— — — pCMV — — ^ ' 

0LRP idang r . a c T V G G l y k 0 s v f s e p s 0 P L A L 

CCGGrCGCCACCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGG TGCCCATCCTGGTCCAGCTGGACGGCGACGTAAACGGCCACAAGrTCAGCG 
GOCCAGCGGTGGTACCACTCGTTCCCGCTCCTCGACAAGTGGCCCCACCACGGGTAGGACCAGCTCGACCTGCCGCTGCATTTGCCGGTGTTCAAGTCGC: 



2 v A 7 M V 3 K G E E L .=* TGVVP ILVELDGOVNGHKFS 

r O'CCoGCGAGGGCGAGGGCGATGCCACCTACGGCA -GCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGAC 
ACAGGCCGCTCCCGCTCCCGCTACGGTGGATGCCO'rZGACTGGGACTTCAAGTAGACGTGGTGGCCGTTCGACGGGCACGGGACCGGGTGGGAGCACTG 



v * G E G £ G 0 A TY3<LTLKF ICTTGKLPVPVPTLVT 

C ^,CC TGACC TACGGCGTGCAGTGC ttc agccgc 3 1 cgaccac ATGA AGC agcacgac ttc ttc aagtccgccatgcccgaaggctacgtccaggag 

■•j'GGGAC rGGATGCCGCACGrCACGAA37CGGC:A'::GGCTGGTG T ACrrC GTCGTGCrGAAGAAGTrCAGGCGGTACGGGCTTCCGATGCAGGTCCT: 

■■ " 1 ■ ' "~ , ' "" 

7 ' G v C ? 5 • * 0 H M K 0 H D F F K 3 A M o E G Y v 0 £ 



BNSDOCID: <WO 982481 0A2 J _> 



WO 98/24810 



144 



PCTYEP97/06956 



Tuesday. 18 November 1997 1 1 .48 

fig 34 pLM4 (1 > 10070) Site and Sequence 



Paget 



CGCACCATCT TC TTCAAGGACGACGGCAAC TaCAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGAC ACCCTGG rG AACCGCATCGAGCTGAAGGGCATr" 
GCG TGGTAGAAGAAGTTCC ^GCTGCCGrTGATGTTCTGGGCGCGGCTCCACTTCAAGCTCCCGCTGTGGGACCACTTGGCGrAGC TCGACTTCCCGTzKr 



0 T I FFKQDGNYK TRAEVKFEGDT 



, L V " « I E L IC G I 



TTCAAGGAGGACGGCAACATCCTGGGGCACAAGCrGGAGTACAACTACAACAGCCACAACGTCTATA TC ATGGCCGA CAAGCAGAAGAAf err a > 
TGAAGTTCCTCCTGCCGTTGTAGGACCCCGTGTTCGACCTCATGTTGATGrTGTCGG TGTTGCAGATAfAG TACCGGC TGTTCGTCTTCTTnrrrrT^^TT 



DFKEOGN ILGHKLEYNYNSHNV 



YIMAOKQKNGI 



GGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCA TCGGCGACGGCCCCGTGCTGCTG 
CCACTTGAAGTTCTAGGCGGTGTTGTAGCTCCTGCCGTCGCACGTC GAGCGGCTGGTGATGG TCGTCTTGTGGGGGTAGCCGCTGCCGGGGCACGACGAC 



VNF K IRHN tEOGSVQLAOH 



YQQNTP IGOGPV 



L L 



CCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCA^CGAGAAGCGCGATCACATGGTCCTGCTGGAGTrCG 
GGGCTGTTGGTGATGGACTCGTGGGTCAGGCGGGACTCGTTTCTGGGGTTGCTCTTCGCGCTAGTGTACCAGGACGACCTCAAGCAC TGGCGGCGGCCCT 



PON HYLST Q S ALS KOPNEKRDHMVLLEFVT 



A A G 



TCACrCTCGGCATGGACGAGCTGTACAAGTCCGGACrCAGATCTCGAGCTCAAGCTTCGAATTCTGCAGTCGATAAGCTTGATATCG AA TTCC TGCAGCC 

agtgagagccgt acctgctcgacatgttcaggcctgagtctagagctcgagttcgaagcWaagacgtcagctattcgaactatag 

— LJJLU 



tlgmdel y<S3 



LQSRAQASNSAVOKLO 



I E F U Q F 



CCTGCrCTTCA GCCAGATGCTGGACCCAGAG7CCCAGA3AAAGAGGACAGrGCAGAATGTCCrGGATCTCCGGCAGAACCTGGAAGAGACCATGTCCA G: 

GGACGAGAAGTCG GTCTAC GACCTGGGTCTCAGGGTC tgtttctcc t gtcacgtcttacaggacctagaggccgtcttggaccttctctggtacaggtcg 



' ORF pLMl 

L L F . S Q M > 0 P E 3 Q * . « » T V Q w VLDLR0NLEETMS3 



CrGCGAGGGTCCCAGGTGACTCACAGCrcCCTGGAGA'GACCrGCTACGACAGCGArGArGCCAACCCACGCAG CGTGTCCAGCC 
GACGCTCCCAGGGTCCACTGAGTGTCGAGGGACCrCTACTGGACGATGCTGTCGCTACrACGGTTGGGTGCGTCGCACAGGTCGi 



TCTCCAACCGC TCG" 



GAGAGGT TGGCGAGl A 




GSOVTHSSLE 



-ORF pLMl 

TCY0SD0ANPRSVSSL3NR 



~.CZC TCTGTC ATG GCGC TA TGGCCAGTCCAGTCZGCG 3C TGC AGGC TGGTGACGCGCCCTCT3TGCGTGGGAGC TGCCGC TCGGAGGGGACGCCCGCC X'j 
iGGGAGACAGTACCGCGAT ACCGGTCAGGTCAGGCGCC3ACGTCCGACCAC TGCGCGGGAGACACCCACCCTCGACGGCGAGCCTCCCr T^rGGGC 




ORF pLMl 



L J v 5 ( c 



s 5 " 3 - A G 0 A P 3 v 0 G 3 C R S £ q 



BNSDOCID: <WO 982481 0A2J_> 




WO 98/24810 145 PCT/EP97/06956 



Tuesday. 18 November 1997 1 1 :48 Paqe % 

fig 34 pLM4 (i > 10070) Site and Sequence 

G "AC A 7GC AC GGCGAACGGGCCC AC T AC TCCC AC ACC A TGC CCA TGCGCAGCCCCAC-C AAGC rCAGCCATATC TCCCGCC *GG AoC TGC- 7CGAA"CCC TG 

CA-GTACGTGCCGCTTGCCCGGGTGATGAGGGTGTGGTACGG3rACGCGTCGGGGTCG7rCGAGTCGGTATAGAGGGCGGACCrCGAi:CAGCrrAG3u>i^ *"*''' 



•insert pLM1 



OHF pLMl 

VMHGERAHY5HTMPMRSP5KL5H ! SRLELVESL 

GAC TCGGATGAGGTGGACC TCAAGTCCGGC TACATGAGCGAC AGTGACCTCATGGGCAAGACCATGA CGGAGGATGATGACATCACTACCGGCTGGGATG 
CrGAGCCTACTCCACCTGGAGTTCAGGCCGATGTACTCGCTGTCACTGGAGTACCCGTTCTGGTACTGCCrCCTACTACTGTAGTGATGGCCGACCCTAC 



• insert pLM1 



-ORF ptMl 



CSOEVDLKSGYMSDSDLMGKTMTEDDD I TTGWD 

AAAGCAGCTCCATCAGTAG tggactcagcg atgcctcagacaatctcagttcagaagaattc aatgccagc tcctcac tcaactccc tcccaagtactcc 
tttcg tcgaggtag tc atcacctgagtcgctacggagtctgt tagagtcaagtcttcttaag ttacggtcgaggagtgagttgagggagggttcatgagg 



■ insert pLMl 



■ORF DLM1 



ESS5ISSGLS0A50NLSSEEFNASS5LNSLPSTP 

cactgcttctcgcaggaactcaacaatagtgctacgcacagactcagagaagcgctcactggcagaaagtgggctgagctggtttagtgaatcagaggau 

' — ' 1 1 r i ! ! ■ ) ■!■ I 2 (C'-' 

GToACGAAGAGCGTCCTTGAGTTGTTATCACGATGCGTGfCTGAGTCTCTTCGCGAGTGACCGTCTTTCACCCGAC TCGACCAAA TCAC TTAGTCTCC TC 



-insert pLM1 



-ORF pLMl 



* A 3 R R NST !VLRTDSEKRSLAESGLSVFS£SEE 
AiAGCCCCTAAAAAACrGGAGTACGACAGTGGTAGCCTGAAGATGGAACCTGGGACTTCTAAGTGGCGGAGGGAGCGGCCTGAGAoCTG'GATGATTCA- 



"C3GGGATTTTTTGACCTCATGC 7G TC ACC A ~CGG AC TTC rACCTTGGACCCTGAAGATTCACCGCC TCCC TCGCCGGAC TCTCGACAC 7 AC 7AAGTA 



-insert pLM1 



-ORF pLMl 



< A P K K LEYCSG3LKME^GTSKVRRERPE3C003 

CC AAGGG TGGACAAC 7GAAAAA CCCC ATCAGCC'OGG CC ACCC7GGTTCCC 7GAAGAAG33C AAGACCCCACC TG TGGCTGTAAC 7TCCCCCATCACTCA 
GG " *CCC ACC TC TTGAC TT 7TTCGG3 TAG TCGGAC CC G3TGG3ACC AAGG3 AC TTCT7CCCGTTCTGGGGTGGACACCGACAT7G AAGGGGGTAGTGAG7 



-insert pLMl 



-ORF oLMl 



L K < P ! S, G'-fPGSLKKGIC 7 P P V A V T 3 5 I TH 



BNSDOCID: <WO 9824810A2J_> 



WO 98/24810 4 . c PCT/EP97/06956 

146 



Tuesday. 13 November 1997 11:48 

fig 34 pLM4 (1 > 10070) Site and Sequence 



Page L 



C ACAGCCCAGAGTGCCC TC A A AG TCGCA3GCAAACCrGAGGCCAAAGCTACAGACAAGGGTAAGCrrGCA3TGAAGAATACrGuGCTCLAACGC TCCT."" 
GT3TCGGGTC TC ACGCGAGTTTCAGCGTCCC7TTGGACTCCCGTTTCGATGTC TGTTCCCAT7 CGAACG TCACTTCT7AfGACCCuAGG "TGCGAGGAG*! 

-insert pLM1 



-ORF pLMl 



T A Q 3 A L K V A G K P E G K A T P K GKLAVKNTGLQRS3 

TCTGATGCTGGTCGGGACCGCCTGAGTGATGCTAAGAAGCCCCCCrCGGGCATTGCTCGCC CCTCCACTTCGGGATCCTTCGGCTACAAGAAGCCTCCTC 
AGACTACGACCAGCCCTGGCGGACTCACTACGATTCTTCGGGGGGAGCCCGTAACGAGCGGGGAGGTGAAGCCCTAGGAAGCCGATGTTCTTCGGAGGA3 



-ORF pLMl 



SDAGRORLSOAKKPPSG IARPSTSG.SFGYKKPP 

CTGCCAC AGGCACAGCCAC TGTCATGCAAj&CTGGTGGTTCAGCCACTCTCAGCAAGATCCAGAAGTCC TCAGGCATCCCTGTCAAGCCAGTAAATGGGCu 
GACGGTGTCCGTGTCGGTGACAGTACGTTTGACCACCAAGTCGGTGAGAGTCGTTCTAGGTCTTCAGGAGTCCGTAGGGACAGTTCGGTCATTTACCCGC 



-insert pLM1 



-ORF pLMl 



PA TGTATVMQTGGSA TLSK iQKSSGIPVKPVNGR 

CAAGACTAGCTTAGATGTTrCCAACAGCGCAGAGCCAGGATTCCTGGCTCCTGGAGCCCG TTCTAACATCCAGTACCGCAGCCTGCCCCGGCCAGCwAAJ 
G" rCTGATCGAATCTACAAAGGTTGTCGCGTCTCGGTCC TAAGGACCGAGGACCTCGGGCAAGATTGTAGG TCATGGCGTCGGACGGGGCCGGTCG37T1I 



-insert pLM1 



-ORF pLM1 



K TSLDVSNSAEPGFLAPGARSN lOYRSLP^P 



TC AAG rTCTATGAGCGTGACCGGCGGGCGGGGTGGACCTCGCCCT GTGAGC AGCAGCA7TGACCCCAGTCTCC TCAGCACCAAGCAGGCaGGCCTTaCG- 
AG'TCAAGATAC FCGC ACTGGCCGCCCGCCCCiCC TGGAGCGGGACAC rCGrCGTCG7MACTGGGGTCAGAGGAGTCGT3GTrCGTCCC~CCGGAA TGC'l 



-insert pLMl 



-ORFpLMt - 



S S S M S y 



GRGG3RPVSSSIDPSLL3TK0 



G L T 



C ' TCCAGAC TGAAGGAGCC TaCCAAGG fAGCCAG'GGG C 3GACCAC TCCAGCCCC TG7CAArCAGACAGATCGGGAAAAGGAG AAGGCCAAAGCCAAGG- 
^AAGGTC rGACTTCCrCGCATG37TCCATCGGT:ACC:GCCTGGTGAGGTCGGGu^^ 



-insert ptM1 



-ORF oLMl 



f" 3 s L .< 



SG3T7=>AP 



NOTCREKiKi 



BNSDOCID: <WO 98248 10A2_I_> 



# 




WO 98/24810 147 PCT7EP97/06956 
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fig 34 pLM4 (1 > 10070) Site and Sequence 

AG TGGCC TTGGACTCAGAC AACA7C TCC TTGAAG AGT A T TGGCTCCCC AG AGAG T AG TCCCAAGAACCAAGC AAGCCACCCC ACAGCCACCAAGCTGGCA 
TCACCGGAACCTGAGTCTGTTGTAGAGGAACTTCTCATAACCGAGGGGTCTCTCATGAGGGTrCTTGGTTCGTTCGGTCGGGrGTCGGTGGTTCGACCGr ^ 



-ORF pLM1 



V A L 0 S 0 N ISLKS IGSPESTPKNQASHPT 



A T K L A 



GAGCTGCCACCAACCCC TC TCAGGGCCACAGCGAAGAGC TTTGTCAAACCACCCTCACTA GCCAATCTTGACAAGGTCAAC TCCAACAG TCTGGATC7AC 
CTCGACGGTGGTTGGGGAGAGTCCCGGrGTCGCTTCTCGAAACAGTTTGGrGGGAGTGATCGGTTAGAACrGTTCCAGTTGAGGTrGTCAGACCTAGATG 



•insert pLM1 



-ORF pLMI 



£ L P P TPLRATAKSFVKPPSLANLOKVNSNSUOL 



CATCATCCAGTGATACCACCCATGCTTCAAAGGTCCCAGATCTGCATGCTACAAGCTCA GCATCTGGGGGCCCTCTCCCTTCCTGCTTCACCCCCAGTCC 

GTAGTAGGTCAC TATGGTGGGTACGAAGTTTCCAGGGTC tagacgtacgatgttcgagtcgtagacccccgggagagggaaggacgaag tgggggtcagg 



-insert pLMI 



-ORF pLM1 



p SSSOTT HAS >CVP 0 LHATSSASGGPLPSCFTPSP 

ggcacccatcctcaatattaactcagccagcttc tcccagggcctggagc TAATGAGTGGT TTCAGTGTGCCAAAAGAGACCCGCATGTaCCCCAAACTC 
CCGTGGGTAGGAGTTATAATTGAGTCGGrCGAAGAGGGTCCCGGACCTCGATTACTCACCAAAGTCACACGGTTTTCrCTGGGCGTACATGGGGTTrGAG ^ 



-insert pLMI 



-ORF pLM1 



A P 1LN1NSA3F3Q G L ELMSGF3VPKE7RMY P K L 
7CAGGCCrGCACAGGA5CATGGAGTCCCTCCAGATGCCAA TGAGCCTCCCCAGTGCCTTCCCCAGCAGTACTCCCGTCCCCACCCCACC7GCTCCCCCTG 

' ' 1 1 1 *" r . ■ ■ , . i . . , ~ VV 

AG 7CCGG ACG TGTCC TCG T ACC 7C AGGG AGGTC 7ACGG 77AC TCGGAGGGG 7CACGGAAGGGGTCG 7CA 7G AGGGCAGGGC 7GGGGTGG ACGAGGG SGAC 



-ORF pLMI 



GLHPSME3L0MPM5LPSAFPSSTP 



P 7 P P A P P 



C:^C7CCCACAGAAGAAGAGACGGAAGAGC7GA:-rG:A3TGGAAGCCCCAGAGCTGGGCAACTGGACAGTAArCA GCGGGArCGGAACAC7Cr7C:CAA 
G-^CGAGGG7GTC7TC77CT:7GCCT7CrCGACT3AACCrCACC77CGGGG7CTCGACCCG7TGACC7G7CA7TAG7CGCCC7ACCCTTG7GAGAAGGGT7 



■insert pLMI 



-ORF pLMI 



* A P T - ^ ^ T ^^i-*vSGSPRAGQL03M'JRDf?M^LF ^ 
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fig 34 pLM4 (1 > 10070) Site and Sequence 



GAAAGGGC ~C AGGTACC AGCTTCAG TCCCAGGAGGAGACCAAGGAGAGGCGACATTCCCATACC A T TGGTGGGC7GCC TGAA TCCGATGaCCaC TCAGA" 
C TTTCCCGAGTCCATCGTCGAAGTCaGGGTCC TCCTCTGGTTCCTC TCCGCTGTAAGCGTATGCTAACCACCCGACGGAC TTAGGCTAC TGGTCAG TC J" 




ORF pLM1 



« G L , R Y 0 > Q S Q E E T K E R R H S H T I G G L P E S 0 0 Q S E 
CTGCCTTCTCCCCCTGCACrTCCCATGTCTCTGAGTGCAAAGGGCCAACTTACCAACATAGTGAGTCCCACTGCGGCCACCACGCCAAGAATCACCCr.r 




LPS Pp AUPMS LSA KG QLTN IVSPTAATTPR [ TP 
CCAACAGCATCCCCACCCACGAGCCGGCCTTCGAGC^^ 

GGTTGTCGTAGGGGTGGGTGCTCCGCCGGAAGCTCGACATGTCGCCGAGGGTTTACCCCTCGTGGGACAGGGACCGGCTCTCTGGGTTCCCTTACTAAGC 




5 N S ! P T * £ A A F E L Y S G S 0 M G S T L SLAERPKGIilR 
GTCAGGATCCTTCCGAGACCCCACGGACGATGTTCACGGCTCAGTGCTGTCCCTGGCCTCCAGTGCCTCCTCCACCTACTCC TCAGC TG AGG AG AGGA TG 




-ORF pLM! 



3 G3 frRD PT 00VHG SVLSLASSASSTYSSA 



E R I* 



C AA TC TGAGCAAATCCGGAAGCTTCGTAGGGAACTGG A A TC A TCCC AG GAAAAAGTGGCCACCTTGACG TC TCA GC T 7 TC TGC C AATGC TAATC TGSTG3 
G"AGACTCGTTTAGGCCTTCGAAGCATCCCTTGACCTTAGTAGGGTCCTTTTTCACCGGTGGAACTGCAGAGTCGAAAGACGGTTACGATTAGACCAC: 




-ORF pLMl 



:i S Z C 1 * K I R R £ L E S S Q E K VATLTSQLSANANLV 
C TCC T TT TGAGCAGAoCC T 3G TC AA "ATGACA TCCCG CC TGCGACACC TGGCAGAGACGGCCGAGGAGAAGGACACTGAGCTGCTGGA7TTGCGAGAAA" 




F E 0 S 



-ORF pLMl 

■NMTS^ LRHLAETAEEKOTE 



L 0 . 



E r 
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C A TAG AC T7TCTGAAGAAAAAGAACTC TGAGGCCCAuGC AGTCATTCAGGGAGCCCT TAA TGCC TCAGAAACC ACACCCAAAGAACTTCGGATCAAGAGm 
«j *A7C TG-AAGACTTCTTTTTCTTGAGACTCCGoGTCCG TCAGTAAGTCCCTCGGGAATTACGGAGTCTTTGG TGTGGGTTTCTTGAASCCTAGTTCTCT 

insert pLM1 — 

OHF pi Ml 

IDrLKKKNSEAQAV [ GGALNASETTPKEL3 IKR 

CAAAACTCCTCAGATAGCATCTCAAGCCTCAACAGCATCACTAGCCATTCCAGCATCGGCAGCAGCAAGGATGCTGATGCGAAAAAGAAGAAAAAAAAGA 
GTTTTGAGGAGTCTATCGTAGAGTTCGGAGTTGTCGrAG TGATCGGTAAGGTCGTAGCCGTCGTCGTTCCTACGACTACGCTTTTTCTTCTTTTTTTTCT 

insert pLMI 

QRF pLMl 

CNSSOSISSLNSITSHSSIGSSKOAOAKKKKKk 



GTTGGGTCTATGAGCTTCGAAGTTCCTTCAACAAAGCGTTCAGTATAAAAAAGGGGCCCAAGTCAGCTTCCTCATACTCGGATATAGAGGAGATT6CTAC 
CAACCCAGATACTCGAAGCTTCAAGGAAGTTGTTTCGCAAGTCATATTTTTTCCCCGGGTTCAGTCGAAGGAGTATGAGCCTATATCTCCTCTAACGAT3 



— insert pLM1 — 

ORFpLMI ~ 

SVVYELRSSFNXAFS IKKGPKSASSYSO I E E IAT 

ACCCGAC TCTTC AGCCCCCTCATCCCCC AAAC TAC AGCATGGTTCCACACAGACTGCTTCACCCTCCATCAAGTCC TCC ACC TTG TC C TCCG TGGGCAC T 
TGGGC TGAGAAG TCGGGGGAGTAGGGGG TTTGATGTCGT ACC AAGGTG TCTCTGACGAAGTGGGAGGTAGTTCAGGAGGTGG AAC AGGAG3CACCCGTGA 

— insert pLM1 



— — ORFpLMI 

P0SSAPSSPKLGHGSTETASPSIKS3TLSSVGT 

GA7GrCACCGAGGGCCCTGCrCACCCAGCCCCCCACACTAGGCTGTTCCATGCAAATGAGGAGGAG GAGCCAGAGAAGAAGGA3GTATCG3AGC TGCGC* 
C "AC AGTGGCTCCCGGGACGAGTGGGT CGGGGGG'GTGArCCGACAAGGTACGTTTAC TCC TCCTCCTCGG TC TCT TCTTCC TCC ATAGC: "CGACGCGA 

insert pLM1 



— ORFpLMI 

rvT£G?AHPAPHTRLFHANEEEE?EKK£V3ELR 

C r GAGCTA:GGGAGAAGGAAA TGAAGCTTACAGAC A T Z Z 3C T TGGAGGCCC TCAACTC rGCCCACC AAC TGGA rCAGC T T CGG3AGACC AT 3CACAACAT 
GaCT:gaTacCCTCT TCCTTTACTTCGAATG'CG'A3GCGAACCTCCGGGAGTTGAGACoGGTGGTTGACCTAGTCGAAGCCCTCTGG"ACGTGTTGTA 

— insert pLMI 

ORFpLMI 

> ^ L V E < £ M K L T Z : 3 L E A L N 3 A H Q L 0 0 L 3 E T "1 H rl P 
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• GCAGTrG3AGGTGGACCTGCrGAAAGCAGAGAATGAC:3ACTGAAGGTAGCCCCAGGCCCCrCATCAGGCrCCACTCCAGGGCAGGTCCCr3GATCA7C" 

CGTCAACCrcCACCTGGACaACrTTCGTCTCTTACTGoCrGACTTCCATCGGGGrcCGGGGAGTAGTCCGAGGTGAGGT CCCGTCCAG^ACCrAGrAGM ^ ' 

insert pLMl - 

OFIF pi M1 

CLEVDLLKAENDRLKVAPGPSSGSTPGQV3GS.5 

GCATT ATCTTCCCCACGCCGCTCCCrAGGCCTGGCACTCACCCATTCCrTCC-GCCCCAGTCTTGCAGAC ACAGACCTGTCACCCATGGArGGCATCAGTH 
CGTAATAGAAGGGGTGCGGCGAGGGATCCGGACCGTGAGTGGGTAAGGAAGCCGGGGTCAGAACGTCTGTGTCTGGACAGTGGGTACCTACCGTAGTCAT * 

insert pLMl - 

-/-IPC pi Ml 

ALSSPRRSLGLALTHSFGPSLADTOLSPMDGIS 

CTrGTGGTCCAAAGGAGGAAGTGACCCTCCGGGTGGTGGTGAGGATGCCCCCGCAGCACATCATCAAAGGGGACTTGAAGCAGCAGGAATTCTTCCTGGG 
GAACACCAGGTTTCCTCCTTCACTGGGAGGCCCACCACCACTCCTACGGGGGCGTCGTGTAGTAGTTTCCCCTGAAC TTCGTCGTCCT7AAGAAGGACCC 

-insert pLMl 



ORF pL.M1 Z 

TCGPKEEv TLRVVVRMPPQH! IKGOLKQQEFFLG 

CrGTAGCAAGGTCAGTGGAAAAGTTGAC TGGAAGATGCTGGATGAAGC TGTrTTC CAAGTGTTCAAGGACTATATTTCTAAAATGGACCCAGCC TC TACC 
GACATCGTTCCAGrCACCrTTTCAACTGACCTTCrACGACCTACTTCGACAAAAGGTTCACAAGTTCC rGATATAAAGATTTTACCTGGGTCGGAGATGCi 

— insert pLM1 . 



— ORF pLMl — — 

CSKVSGKVOVKMLDEAVFOVFKDY I3KM0 = AST 



CT3GGACTAAGCAC TGAGTZCATCCATGGCTACAGCATC AGCCACG TG AAACGAGTG 77GGA TGCAGAGCCCCCCGAGATGCC TCCT TGCCGTCGAGGTG 
GACCC 7G A7TCG TGAC 7CAGG TAGGTACCGA'GTCGTAG TCGGTGCACTT T3C TCACAACCT ACGtCTCGGGGGGC TCTACGGAGGAACGGCAGCTCCAU ° 



insert pLMl 

ORFpLMI 

L 5 L ~- T E3 i HG YS ISHVKRVLCAEPPEMPPCRRG 

T»:AATAACATA7CAGTC7C:CTCAAAGGrCTG^AGGAGAAATGCGTCGACA3CC7GG7 GrTCGAGACGCTGATCCCCAAGCCGATGATGCAGCACTACA' 
AG" 7A 7TG7A 7AGTCAGAG 3GAG 7T TCCAGAC T7CCTZ TTTACGCAGC 7G T CGGACC AC AAGC TC TGCG AC TAGGGGTTCGGC TACTACG7CGTGATGTA 

— insert pLM1 



— ORFpLMl — 

v * ! tl 1 S v S I :< G < £ K C V 0 3 L V F E " L I P K » M t< :: H Y I 
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Page* 



^^CCrCCTGCTGAAGCACCGGCGCC7CGTCCr:rc3G GCCCCAGCGGCACGGGCAAGACCrACCrGACCAATCGCTTGGCCGA2rACCT53T33AGrG" 
T-CGGAGGACGACTTCGTGGCCGCGGAGCAGGAGAGCCCGGGGTCGCCGTGCCCGTTCTGGATGGACTGGrTAGCGAACCGGCTCATGGACCACCTCG*"" "'^ 



-ORF pLMl 



LLLKHRRLVL5GPSGTGKTYLTNRLAEY 



L V £ R 



TCTGGCCGTGAGGTCACAGAGGGCATCGTCAGCACCTTCAACATGCACCAGCAGTCTTG CAAGGATCTGCAACTGTATCTTTCCAACCTAGCCAACCAGA 

agaccggcac tcc agtgtc tcccgtagcagtcgtggaagttgtacgtggtcgtc agaacgttcctagacgttgacatagaaaggWggatcggttgg-c" 

-insert pLM1 



-ORF pLMl 



SGREVTEG IVSTFNMHQQSCKOLQLYLSNLANO 



T^qACCGGGAAACAGGAATrGGGGATGTGC CCCTGGTGArTCTATTGGATGACCTGAGTGAAGCAGGCTCCATCAGTGAGTTGGTCAATGGGGCCCTCA; 
aTCTGGCCCTTTGTCCTTAACCCCTACACGGGGACCACTAAGATAACCTACTGGACTCACTTCGTCCGAGGTAGTCACTCAACCAGTTACCCCGGGAGTu ^ 



-insert pLM1 



-ORF pLMl 



ORETGIGOVPLV 



LLOOLSEAGS ISELVNGALT 



C'cCAAGTATCATAAATGTCCCTATA TTATAGGTACCACCAATCAGCCTGTAAAAATGaCaCCCAACCATGGCTTGCACTTGAGCTTCAGGATGTTGAC; 

' ' ' »— — | ' ■ ' 1 11 1 i 1 i i i " | ^T.'V 

GACGTTC ATAGTATTTACAGGGATATAATATCCA*GG7GGTTAGTCGGACATTTTTAC TG TGGGTTGGTACCGAACGTGAAC TCGAAGTCCTACAACTG3 * " * 



-insert pLMl 



-ORF pLMl 



K Y H K C P Y I IG 



TNOPVKMTPNHGLHLSFftMLT 



'':rcCAACAACGTGGAGCCAGCCAA7GGCTTC:'G3TTCGTTACCTGAGGAGGAAGCT GGTAGAGrCAGACAGCGACATCAATGCCAACAAGGAAGA>:>: 
A.-:A33TTGTTGCACCTCGGTCGGTrACCGAAG3AC:AAGCAATGGAC TCC TCCTTCGACCATCTCAG7CTGTCGC TGTAGT TACGGTTGT7CC 7TC7C3 



■insert pLM1 



-ORF pLMt 



-' S .N N V E P A NGFLVR YLRRKLVESD SD tNAHKEE 

~>jZ rrCGCGTGC TC G A C 7G 3G T AC C C A A 3C TG 73 "A " CATC TCCACACC TTCCTTGAGAAGCACAGCACCTCAGACTTCCTCATCGGCCCTTGCTTCT" 
3AAGC CC ACG AGC T 3AC CCA 7GGG 7 TCGaCACC a 7 A3 TAG AGGTGTGGAAGGAAC TCT7CGTG7CGTGGAG TC TGAAGGAGTA3CCGGGAAC3AAGAA 



insert pLMl 





I 


- R V L 0 ■< V P :< 


-ORF pLM1 

CLVvhlhTFLEKHS T30F«_ 


1 G => 


: f f 
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~C TGTCG TGTCCCATTGGCATTGAGGAC TTCCGGACC TGGTTCA TTGACC TGTGGAACAAC TCTATCATTCCC rATCTA CAuGAAoGjiijr; AASSArGGI- 
AGAC AGC ACAGGGTAACCG TAACTCC TGAAGGCC TGGACCAAGTAACTGGACACCTTGTTGAGATAGTAAGGGATaGA TG TCC TTCCTCGG T TCC TAT t"" 




L S C , P 1 G E 0 F » T V F [ D L W M N S I I P Y L 0 E.G A K 0 G 

ATAAAGGTCCATGGAC AGAAAGC ^GCTTGGGAGGACCCAGTGGAATGGGTCCGGGACACACTrCCCTGGCCATCAGCCC AACAAGACCAA TCAAAGCTrt" 
^ A TTTCC A ^Q^ A CCTGTCT f TCGACGAACCCTCCTGGGTCACCTTACCCAGGCCCTGTGTGAAGGGACCGG TAGTCGGGTTGTTCTGGTTAGTTTCGACA 



- insert plMl 



■ORF pLMl 



K G G K A A W t D P V £ tf V R p T LPVPSAQQOOSKL 



ACCACCTGCCCCCACCCACCGTGGGCCCTCACAGCATTGCCTCACCTCCCGAGGATAGGACAGTCAAAGACAGCACCCC AAGTTCTCTGGACTCAGATCC 
TGGTGGACGGGGGTGGGTGGCACCCGGG AG TGTCG TAACGGAGTGGAGGGC TCCTATCCTGTCAGTTTC TGTCGTGGGGTTC AAGAGACCTGAGTCTAG3 

- insert pLMl 



52 ;X 




■ H L P P P T . V G p ,H S 1 A 5 P P E 0 R T V K 0 5 T P S S L D S D P 

TC TGATGGCCATGC TGCTGAAACTTCAAGAAGCTGCCAACTACATTGAGTCTCCAGATCGAGAAACCATCC TGGACCCCAACCTTCAGCCAA CACTTTAA 
A'JaCTACCGGTACGACGACTTTGAAGTTCTTCGACGGTTGATGTAACTCAGAGGTCTaGC tctttggtaggacctggggttggaagtccgttgtgaaatt ^ 



-insert pLMl 



L MAMLLKLOE 



-ORF pLMl 



A N Y t ESPORETtLOPNLQATC 



GGG TTCGGCAATCAC TGTC ACCCCCGCACAGCA3 AACGH TGGCATCAGCTATCTTAGCTCCTCCTCTCCCCTC TCCTCTTTC A 3AGCAC ^GGCTCTCCAIi 
CCCAAGCCGT TAGTGAC AG TGGGGGCC TGTCG TC TTGCGACCGTAG TCGA TAGAATCCAGGAGGAGAGGGGAGAGGAuAAAG TCTCGTOaCCGAGAG^t:: 



~ ~ " — — inseft pLM1 

GNHCHPRTAE5VH0LS 



LULSPLLF03TGSP 



C^CAGGAGGAGAACAGGA3GGA GGAGGAGA-GAAA3AGGAGGGACAGGTTCTTGGTGCTGTACCTTTGAGAACrTCCTAGGAAGGAATGGTGGGGTGG: 
O'JUG TCC TCC TC TTG TCCTCCCTCC TCC TC TAC""""C "CC TCCC TG TCCAAGAACC ACGACA TGGAAAC TC TTGAAGGATCC TTCCT TACC ACCCC ACCU 



G G E Q £ G G 



•insert pLMl 



0 E 



3GTG3WCCTFENFLGRNG 



G r~ ^SGGAAC T TGTGCCCCCTAAACACATT TACTGGC C TCC TCTAA7GAC TTTGGCGAAAAGATGATTCTGGGTC TTTCCC TTGACT T CTTGTTTC AAT ~ 
«_ V.AACCC T TGAACACGGGGGA TT TG TG TAA A TQA" ZZZ±ZO~ZA T7AC TG AAACCCC " * T TC TACTAAGACCC AGAAAGGC AAC TGAAGAAC AAAG T TAA 



?i L C P l N 



insert pLMl — 

L. . L v G K o 0 S G 



:> I 
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ACAAACTCCTGGGCTTTCTGGGGAGGGG TTCAGAAAACA rCAAAACACTGCAGCAGT TCCCCGGAATTCAGCTrGGACTTAACCAGGCTGAACTTGCrCA 
TGTTTGAGGACCCGAAAGACCCCTCCCCAAGTCTTTTGTAGTTTTGTGACGTCGTCAAGGGGCCTTAAG TCGAACCTGAATTGGTCCGACTTGAACGAGT ^ 



-insert pLMI 



T N S W A F V G G V Q K T S K H C S$SPEFSLDLTR Lnll 
AAAGAAGCCGAATTCCAGCACAC ^GGCGGCCGTTACTAGrTCTAGATAACTGATCATAATCAGCCAT ACCACATTTGTAGAGGTTTTACTTGCTTTAAAA 

tttcttcggcttaaggtcgtgtgaccgccggcaatgatcaagatctattgactagtattagtcggtatggtgtaaacatc tccaaaatgaacgaaattt* 5eC< 
insert pLMI 1 

KftSRIPAKURPLLVtO N.S. SAlPHL.RFYL.l_.> 
AACCrCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTG CAGCTTATAATGGTTACAAAfAAAGCAATA 

ttggagggtgtggagggggacttggactttgtattttacttacgttaacaacaacaattgaacaaataacgtcgaatattaccaatgtttatttcgttat 

T $ H T S P . T . N I K . M Q L L L L TCLLQL IMVTJMKA1 

GCATCACAAATTTCACAAATAAAGCArTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTA TCTTAACGCGTAAATTGTAAGCGTTA 
CGTAGTGTTTAAAGTGTTTATTTCGTAAAAAAAGTGACGTAAGATCAACACCAAACAGGrTTGAGTAGTTACATAGAATTGCGCATTTAACATTCGCAAT 



i -H Qft — 

A S G I SOIKHFFHC ILVVVCPNSSMYLNA. [ V S V 



A7ATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAAATCCCTTATAAATCAAAAGAATAGAC 



rATAAAACAATTTTAAGCGCAAfTTAAAAACAATTTAGTCGAGTAAAAAATTGGTTATCCGGCTTTAGCCGTTTTAGGGAATATTTAGTTTTCTTATCTG 



7IO." 



* L L K F A L NF C I S S F F N Q . A E I G KIPYK5KE.T 

C'jAGATAGGGTTGAGTGTTGTTCCAGTTTGGAAC AAGAQTCC ACTA TTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCGTC'ATCAGGGCGAT 
GCTCTATCCCAACTCACAACAAGGTCAAACC7TGT7CTCAGGTGATAATTTCTTGCACCTGAGGTTGCAGTTTCCCGCTTTTTGGCAGATAGTCCCGCTA 



-lion 



£ I G L S V V P V 'W N K S PLLKNVDSNVKGRK T V Y 0 G 0 

GOLCCAC TAG GTG A ACC ATCACCC TAATC A AG TTTTTTGGGG TCGAGG TGCCGTAAAGCACTAAATC GGAACCCTAAAGGGAGCCCCCGATT7AGAGC 7" 
CCGGGTGATGCACrTGGTAGTGGGATTAGTTCAAAAAACCCCAGCTCCACGGCATT-CGTGArTTAGCCTTGGGATTTCCCTCGGGGGCTAAArCTCGAA 



- l ion 



GPL REp 3P.S 3FL GSR CRKALNR.NPKGSPPFRA 

GACGGGGAAAGCCGGCGAACGTGGCGAGAAA^^A^GGGA AGAAAGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCGGrCACGCTGCGCGTAAC 
C r GCCCC TTTCGGCCGC7TGCACCGC7C TTTC "TCCC TTC TTTCGCTTTCCTCGCCCGCGATCCCGCGACCG TTCACATCGCCASTGiGACGCGC A7TG 



? G * P A M V A 5'<£GKKAKGAGARALA3VAVTLftVT 

C AC c AC ACCCGCCGCGC 7 TAA7GCGCCGC 7 AC A3 GGCGC G7CAGGTGGC AC TTTTCGGGG A AATG7GCGCGGAACCCC T ATT TG T 77 A"7TTC 7AAA TA 
G "^GTG TGGGCGGCGCGAATTACGCGGCGATG TCCC3C3CAG TCCACCGTGAAAAGCCCC T T TACACGCGCC T TGGGGA7AAACA.-.- T *AAAAG AT 77 AT 

: > 



'PAALNA^L G A 5 G G r F R G M 



G T = I C _ 
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fig 34 pLM4 (1 > 10070) Site and Sequence . ' 

CArrCAA^rATGTATCCGCTCATGAQACAATAACCCrGArAAATGCTrCAATA ArATTGAAAAAGGAAGAGTCCTG^GGCGGAAAGAACCAGCTGTGGAA 
GTAAGn rATACATAGGCGAGTACTCTGTTATTGGGACTATTTACGAAGTTATTATAACrTT TTCCTTC TCAGGAC TCCGCC TTTCTTGGTCGACACCTT 
HSN HYPLMRQ . p. .MLQ.Y.KRKSPEAERTSCG 



TGfGTGTCAGTTAGGGTGTGGAAAuTCCCCAGGCTCCCCAGC AGGC AGAAGrATGCAAAGCATGCATCTCAATTA GTCAGCAACCAGGTGTGGAAAGTC" 

acacacagtcaatcccacacctttcagggg tccgaggggtcgtccg tcttcatacgtttcgtacgtagagttaatcagtcgttggtccacacctttcagg 77C ' 

H C V S . G V E S P Q A P Q Q A £ V C K A C I S I SQQp G VESP 

ccaggctccccagcaggcagaagtatgcaaagcatgcatct^ 

ggtccgaggggtcgtccgtcttcatacgtttcgtacgtagagttaatcagtcgttggtatcagggcggggattgaggcgggtagggcgggga 7eC ' : 

Q A P Q Q A E V C K A C I S ISQQP.SRP.LRPSRP.LP 
CCAGTrCCGCCCATTCTCCGCCCCATGGCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCT CGGCCTCTGAGCTATTCCAGAAGTAGTGAGG 

ggtcaaggcgggtaagaggcggggtaccgactgattaaaaaaaataaatacgtctccggctccggcggagccggagactcgataaggtc ttcatcactcc 

P v p P *_ I ft P H A D . F F L FMORPRPPRPLSYSRSSE 
AGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAGATCGATCAAGAGACAGGATGAGGATCGTTTCGCArGAT TGAACAAGATGGATTGCAC GC AGG TTC TCC 

tccgaaaaaacctccggatccgaaaacgtttctagctagttctctgtcctactcctagcaaagcgtactaacttgttctacctaacgtgcgtccaagags & " CC 



EA frt -£A. AFA K10QE TG.GSFRMIEODGLHAGSP 

ggccgcttgggtggagaggctattcggctatgactgggcacaacagacaatcggctgctctgatgc cgccgtgttccggctgtcagcgcagggg 
ccggcgaacccacctctccgataagccgatactgacccgtgttgtctgttagccgacgagactacggcggcacaaggccgacagtcgcgtccccgcggg: SVJ> ' 



-*an*Neo 



A A v v er lfgyovaqqtigcsoaavfrlsaqgrp 



GTCTTTT-GTCAAGACCGACCTGTCCGGTGCCCrGAATGAACTGCAAGACGAGGCAGCGCGGCTAT CGTGGC TGGCC ACGACGGGCGTTCC TTGCGCA3 
CAAGAAAAACAGTTCTGGCTGGACAGGCCACGGGACTrACTTGACGTrCTGCTCCGTCGCGCCGATAGCACCGACCGGTGCTGCCCGCAAGGAACGCGTC 



Kan/tfeo"" 



v L F , V KT 0L 3GALNELODEAARLSVLATTGVPCA 



C — rGCTCGACCTTGTCAC TGAAGCGGGAAGGGACTGGC TGC TATTGGGCGAAGTGCCGGGGCAG GATCTCCTGTCATCTCACCTTGCTCCTGCCGAGAA 
Ga^aCGAGC TGC AACAGTGACTTCGCCCTTCCCTGACCGACGATAACCCGC rTCACGGCCCCGTCCTAGAGGACAGTAGAGfGGAACGAGGACGGC TCTT 



-KaruKeo 



AV LDVVT EAG Rg L LL GEVPGQOLL S3HLAPA£> 

AG TATCC A TCATGGC To ATqCAAT3CCGCGGC T3C ATACGCT TGATCCGGC TACC TGCCC ATTCGACCACCAA GCGAAACATCGC ATCGAqCGAGC ACG' 
rCA-AGGTAGTACCGACTACGTTACGCCGCCGACGTATGCGAACTAGGCCGATGGACGGGTAAGCrGGTGGrTCGCTrTGTAGCGTAGCrCGCTrfiTr.r^ 



VSIMA Q AM3RLHT LD PATCPFOHQAKHR IERAR 

ACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAAC TGTTCGCCAGGC TC AAGoCGAGCATGC 
'GAGCC r ACC TTCGGCCAGAACAGC T AG TCCTAC "AG AC C TGC T TC TCGT AGFCCCCGAGCGCGGrCGGCTTG AC AAGCGGTCCGAG TTCCGC fCGTACl: 



Q * £ A 3 L V C 0 D I* 0 E E H (J G L A P A £ •_ F A R L K A 3 f* 
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CCGACGGCGAGGATCTCGTCGTGACCCATGGCGATGCCTGCTTGCCGAAT ATCATGG TGGAAAATGGCCGC TTTTCTGGA77C ATCGAC 73 7GGCCGG" 

, i ■ ■ * . i t — • ' 1 1 — 1 ' ■ 1- 36 C« 

GGC TGCCGCTCCTAGAGCAGCACTGGGTACCGCTACGGACGAACGGCTTATAGTACCACCTTTTACCGGCGAAAAGACCTAAGTAGCTGACACCGGCCGA 



.Kan.neo 



POGEOLVV THGDACLP N [HVEN G R F S G F I D C G R I 

GGGTGTGGCGGACCGCTATCAGGACATAGCGTTuGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGC TGACCGC TTCCTCG7GCTTTACGG" 

, 1 ■ ■ t ' ' ' 1,1, 1 ' 1 1 ' k 57 

CCCACACCGCCTGGC GATAGTCCTGTATCGCAACCGATGGGCACTATAACGACTTCTCGAACCGCCGCTTACCCGACrGGCGAAGGAGCACGAAATGCCA 



GVAOR YQOIALATROIAE elgge vaorflvlvg 

ATCGCCGCTCCCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGACGAGTTCTTCTGAGCGGGACTCTGGGGTTCGAAATGACCGACCAAGCGACGCC 

1 1 ■ 1 ■ t ■ l ■ 1 1 1 ' 1 ' : ' l ocC>: 

TAGCGGCGAGGGCTAAGCGTCGCGTAGCGGAAGATAGCGGAAGAACTGCTCAAGAAGACTCGCCCTGAGACCCCAAGCTTTACTGGCTGGTTCGCTGCG3 

> , 

I AAPOSQR IAF Y R L L D E F F . A G L V G S K . P T K R R 

caacctgccatcacgagatttcgattccaccgccgccttctatgaaaggttgggcttcggaatc gttttccgggacgccggctggatgatcctccagcg: 
gttggacggtagtgctctaaagctaaggtggcggcggaagatactttccaacccgaagccttagcaaaaggccctgcggccgacctactaggaggtcgcg 
ptchheis ippppsmkgvasesfsgtpag.sssa 

ggggatctcatgctggagttcttcgcccaccctacggggaggctaactgaaacacggaaggagacaataccggaaggaacccgcgctatgacggcaataa 

i , 1 iii i i i 1 1 1 i ■ i ■ ' i 900: 

cccctagagtacgacctcaagaagcgggtggga7ccccctccgattgactttgtgccttcctctgttatggccttccttgggcgcgatactgccgttatt 

giscvsssptlggg lkhgrrqyrkepal.rq. 
aaagacagaataaaacgcacggtgt7gggtcgtt7gt tc ataaacgcggggttcgg7cccagggctggcac tc tgtcgatacccc accgag accccattg 

t - : - : ! III ■ I ■ L . . 1 1 

777CTGTCTTATTTTGCG7GCCACAACCCAGCAAACAAGTATTTGCGCCCCAAGCCAGGGTCCCGACCGTGAGACAGCTA7GGGG7GGCTCTGGGGTAA: 
KORIKRTVLGRLF INAGFGP R A G T L 3 j P H R D P 1 

GGGCCAATACGCCCGCGT7TCT7CC7TT7CCCCACCCCACCCCCC AAGTTCGGGTGAAGGCCCAGGGCTCGCAGCCAACG7CGGGGCGGCAGGCCCTGC: 
CCCGGTTATGCGGGCGCAAAGAAGGAAAAGGGG7GGGGTGGGGGGTTCAAGCCCACTTCCGGGTCCCGAGCG7CGGTTGCAGCCCCGCCGTCCGGGACGU 
GAM TPAFLPF?*? TP0VRVKA0G50PTSGRGALF 

ATAGCC TCAGGrTACTCATATA7ACTTT AGATTG A7TTAAAACT TC A7TTTTAA7TTAAAAGGATCTAGGTGAAGATCC7T7 T73ATAA7C 7CATGACCA 
TA7CGGAGTCCAATGA3TATATATGAAATCTAAC7AAATrTTGAAGTAAAAATTAAATTrTCCTAGATCCACTTCTAGGAAAAA:rAT7A3AGTAC TGG' 
P Q V 7 H I Y F R L I NF IFNLKGSR R S F L f IS P 

aaaTCCCTTAACGTGAGTT r7CG7 7CCAC 7GAGCG7C A3ACCCCGTAGAAAAGA7CAAAGGATC T7C TTGAGA TCCT7T7TT 7CT3CGCG7 AA7CTGC TU ^ 
f 77AGGGAAT TGC AC TCAAAAGC AA3GT3AC 7CGCA37C 7GGGGCA7C TT TTCTAGTTTCCT AGAAGAACTCT AGGAAAAAAA3ACGCGCA77AGACGAC 



E 



-pUC on 



<SLNVSFRS7ER3TP. KRSKOLLE I L F F C A . S A 



CT"GCAAACAAAAAAACCACCGC7ACCAGCGGT3G77 73TTTGCCGGATCAAGAGCTACCAA CTC77TTTCCGAAGG7AAC7GGCTTCA0:aGAGCGCA: ^ 
GAACGTTTGr TrTrTr3GTGGCGArGGrCGCCACCAAACAAACGGCCTAG7rCTCGATGGTTG AGAAAAAGGCr7CCATrGACC3AAG7C37CrCGCGT: 

■ ■ ■ pi II '.-Ml — 1 ■ ■■■■ — ' " ■ ■' 

ACK0:<MHRr2»vFvCRIKSY0LFFRP.LA3AERF 
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Page ti 



AfACCAAATACTGTCCTTC TAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTC TG TAGCACC6CCTACATAC C TCGC TC TGC TAATCC TGTTACCA" 
TATGGTTTATGACAGGAAGATCACATCGGCATCAATCCGGTGGTGAAGTTCTTGAGACArCGTGGCGGATGTA TGGAGCGAGAC GATTAGGACAATGfiT" ^ 



V 0 I L S F 



C S R S 



A T T S R T t 



HRLHTSLC 



S C Y 0 



TGGCTGCTGCCAGrGGCGATAAGTCGTGTCTrACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAG CGGTCGGGC TG AACG6GGGG TTCG T" 

accgacgacggtcaccgctattcagcacagaatggcccaacctgagttctgctatcaatggcctattccgcgtcgccag cccgac TTGccccccAAGrA^ 97c ' 

H u ' w — — _ 
v L I P V A I S ft V L P G V T Q D D S Y R I R ft S G R A £ R G V R 

CACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGC TTCCCGAAG 

GTGTG TCGGGTCGAACCTCGCTTGCTGGATGTGGCTTGACTCTATGGATGTCGCACTCGATACTCTTTCGCGGTGCGAA GGGCTTCCCTCTTTCCGCC ^ 



AHSPAVSERPTPN 



OTYSVSYEKAPRFPKG 



E R R T 



AGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTAT AGTCCTGTCGGGTTTCGCCACC 
TCCArAGGCCATTCCCCGTCCCAGCCTTGT^ 



G I R 



AA GSEQESA RGS FQ GETPG IFIVLSGFAT 
TCTGACTTGAGCGTCGATTTTTG TGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCC7TT TTACGGTTCCTGGCCTTTTGCTC: 

agactgaactcgcagctaaaaacactacgagcagtccccccgcctcggaIacctttttgcggtcgttgcgccggaaaaaIgccaaggaccggaaaacga: l ° C( 



50LSV0FCDARQGGGAYG 



ktpatrpfygswpfa 



GCC TTTTGC T C AC ATG T TC T T TCC TGCG TT A 7QC CC TGATTCTGTGGATAACCG TAT TACCGCCATGC AT 



cggaaaacgagtgtacaagaaaggacgcaataggggactaagacacctattggcataatggcggtacgta 

G L L Lr CSFLRY?LlLWITVLPPCI 



1007C 
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Enzymes : 72 of 146 enzymes (Filtered) fl P 

Settings: Linear. Certain Sites Only, Standard Genetic Code | 

TAGTTATrAATAGrAATCAATTACGGGGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAArGGCCCGCCTGGCTGAC*""* 

1 ' 1 1 ■ ' ' 1 1 1 1 1 ' ' ' 1 *— ~ I 

ATCAATAATTATCATTAGTTAATGCCCCAGTAATCAAGTATCGGGTATATACCTCAAGGCGCAATGTATTGAATGCCATrTACCGGGCGGACCGACTGGC 

LLIVIMYGVISS.PIYGVPRYITYGKVPAVLT 
Aat II A.at II 

CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGrCAATGGGTGGAGTATTTACGGT 

1 ' 1 1 1 ' 1 ' i i i i i t i i 1— — ■ 2£«" 

GGGTTGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCATTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTCATAAATGCCA 

AQRPPPIOVNNOVCSHSNANRDFPLTSMGGVFTV 

Bgl I Nde I Aat II Dgl I 

AAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTA 

— 1 ' ' 1 ' 1 ■ ' ■ 300 

TTTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGGGGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCAT 



N C P L 


G S T S S V S 


YAKYAPY.RQ.R MARLALCPV 


SnaB 1 ^co 1 

CATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGA 


GrACTGGAATACCCTGAAAGGATGAACCGTCATG 
HOLMGLSYLA V 


7AGATGCATAATCAGTAGCGATAATGGTACCACTACGCCAAAACCGTCATGTAGTTACCCGCACCT 
HLR ISHRYYHGDAVLAVHQVAV 


TAGCGGTTTGACTCACG3G3ATTTCCAA3TC7CC 


A.at II 

i 

ACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTA 


ATCGCCAAACTGAGTGC:CCrAAAGGTTCAGAG3 
1AV.LTGISKS ? 


1 ' « 1 I ■ I ' ■ t ■ t i 
i ^uoGTAACTGCAGTTACCCTCAAACAAAACCGTGGTTTTAGTTGCCC FGAAAGG TTTTACAGCAT 

PH. RQVEFVLAPKSTGLSK'MS. 


^CAACTCCGCCCCATTGACGCAAATuGGCGGTAa 


Nhe 1 £c47 

! ! 

GCGTGTACGGTGGGAGGTCTATaTAAGCAGAGCTGGTTTAGTGAACCGTCAGATCCGCTAGCGCTh 


tgttgaggcggggtaactgcgtttacccgccat: 
clrpioangr. 


CGC ACATGCCACCCTCC AGATATATTCGTC TCGACCAAATC ACTTGGC AGTCTAGGCGATCGCGAT 
ACTVGGLYK03VFSEPSDPLAL 


Nco 1 

CC3GTCGCCACCATG3 Tc-jC AAGuCCGAGGAG: 


"GTTC ACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGrAAACGGCCACAAGTTC AGCS 


ggccagcggfggtacc a:t:gttcccc: tcc~cg 


ACAA3TGGCCCCACCACGGGTAGGACCAGC TCGACC TGCCGC TGC A rTTGCCGGTGTTCAAG TCGZ 










-VAT 


M v 5 K G E E 


LFTGVVPILVELDG0VNGHKF3 


:G r CCGGCGAGGGCGA33GCGATGCCACCTACG3 


-AA3C TGACCCTGAAGTrCATCTGCACCACCGGCAAGCTGCCCGTGCCC T3GCCC ACCCTCGTGAC 


^'-AGGCCGCTCCCGC TCCC 3C tacggtggatgcc 


jTCGACTGGGACTTCAAGTAGACGTGG TGGCCGTTCGACGGGC ACGGG ACCGGGTG3GAGC ACT3 




V 3 G E 


5E3DATY3 


*GFP.C.e.ur.c53 — — — 

< L T L K F 1 C T T G K L P v P V P F L V T 
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Kg 30 pEGFP72 (1 >9697) Site and Sequence 



C ACCC TGACC TACGGCG TGCAGTGC TTCAGCCGC TAC 


CCCGACCACArGAAGCAGCACGACTTCTTCAAGTCCSCCATGCCCSAAGGCTACCiTCCAGGW 




GrGGGACTGGATGCCGCACGTCACGAAGTCGGCGATGGGGCTGGTGTACTTCGTCGTGCTGAAGAAGTTCAGGCGGTACGGGCTrcCGAToCAGGTCCT" 










TLTYGVOCrSRY 


eGFP.C.9.unc53 — — — 

P OHM KQHDFF.<SAMPEGYyQE 




f<spl 

I 

CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCArCGAGCTGAAGGGCATCG 




GCGTGGTAGAAGAAGTTCCTGCTGCCGTTGATGTTCTGGGCGCGGCTCCACTTCAAGCTCCCGCTGTGGGACCACTTGGCGTAGCrCGACTTCCCGTAG- 


iocc 








R T IFFKDOGNYK 


eGFP.C.e.unc53 — — ; 

T RAEVKFEGOTLVNR IELKGI 




AC TTC AAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACT AC AACAGCCACAACGTCTATATCATfifirrfiir AA<^r ara Aii.r *.rr.nr at/~ a .-. 




TGAAGTTCCTCCTGCCGTTGTAGGACCCCGTGTTCGACCTCATGTTGATGTTGTCGGTGTTGCAGATATAGTACCGGCTGTTCGTCTTCTTGCCGTAGTT 


> ICC 








OFKEDGNILGHKL 


eGFP.C.e.unc53 

EYNY NSHNVY t MADKQKNG IK 




GGrGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCACCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTG 




CCACTTGAAGTTCrAGGCGGTGTTGTAGCTCCTGCCG TCGCACGTCGAGCGGC TGGTGATGGTCGTCTTGTGGGGGTAGCCGC TGCCGGG3CACGACGAC 


1 20C 








VNFK1RHNIE0G 


eGFP.C.e.unc53 

SVQLAOHYQQNTP IGDG PVLL 




CCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAG 


CAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACC3CC3CCGGGA 




GGGCTGTTGGTGATGGACTCGTGGGTCAGGCGGGACTCGTTTCTGGGGTTGCTCTTCGCGCTAGTGTACCAGGACGACCTCAAGCACTGGCGGCGGCCCT 










=»ONHYLSTQSALS 


eGFP.C.e.urvc53 — ______ 

KOPNEKftDHMVLLEFVTAAG 




BspM II 

'CACTCTCGGCATGGACGAGCTGTACAAGTCCGGACrC 


PS* 11 EcoN I 

: ; 

AGATCrACGTCAAATGTAGAArrGATACCAATCTACACGGATTGGGCCAATCGGCACCTTT: 


AGTGAGAGCCGTACCTGCTCGACATGTTCAGGCCTGA3 


TCTAGATGCAGTrTACATCTTAACTATGGTTAGATG7GCCTAACCCGGTTAG:CGTGGAAAc 












eGFP.C.e.unc53 




' r L G M 0 E l Y < S G l 


aSTSNVEL IP I YTDWAN^HLS 




GaaGGGC AGC TTaTCAAAG rCGATTAGGGATATTTCC A 


Nru 1 E C oR I 

! j 
ATGATTTrCGCGACTATCG ACTGG TTTC "CAGC TT ATTAATG 1*3 A7CGT TCCGmTC AACGAA 


C "CCCCTCGAATAGT r7CAGCTAArc;CTArAAAG3* 


T AC T AAAAGCGC f 3A TAGC 7GACC AAAGAG TCG AA TAA T 7AC AC T A3CAAGG: 7A37TGC T : 












eGFP.C e urc53 — — — 




* -'i U i * i =0.1 : 


C.e. unc53 ■ 

' D F R 0 ■( P L V 3> ) L 1 ll I v - 1 M E 
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ftq 30 p£GFP72 (1 > 9697) Site and Sequence 

Bsm I 

I 

T "C TCGCCTGCATTCACGAAACG fTTGGCAAAAATCACATCGAACC TG GATGGCCTCGAAACGTGTC TCGACTACCTGAAAAATC rGGGTCTCGAC TGCT 

' ' ' I ■ ■ ' |,| 1 1 1 1 i \ l • , , t £,y 

AAGAGCGGACGTAAGTGCTTTGCAAACCGTTTTTAGTGTAGCTTGGACCTACCGGAGCTTTGCACAGAGCTGATGGACTTTTTAGACCCAGAGCTGACGA 



•eGFP.C.e.unc53 



-C.e. unc53 



FSPAFTKRLAK I TSNL OGLE TCLOYLKNLGLDC 

Ear I 

pcoR V pvu tl j<sp632l Hind lit 

CGAAACTCACCAAAACCGATATCGACAGCGGAAACTTGGGTGCAGTTCTCCAGCTGCTCTTCCTGCTCTCCACCTACAAGCAGAAGCTTCGGCAACTGAA 

. i i i i i i , ),),,! . 1 ■ i ■ i \7c<: 

GCTTTGAGTGGTTTTGGCTATAGCTGTCGCCTTTGAACCCACGTCAAGAGGTCGACGAGAAGGACGAGAGGTGGATGTTCGTCTTCGAAGCCGTTGACTT 



-eGFP.C.e.unc53 



-C.e. unc53 



SKLTKTOIOSGNLGAVLQLLFLLSTYKQKLROLK 

. J. 1 .......... I . . . t ! ^ - 1 _■ 1 .. . . 

PmaCI 

Sst II 1 PmaCI 

AAAAGATCAGAAGAAATTGGAGCAACTACCCACATCCATTATGCCACCCGCGGTTTCTAAATTACCCTC GCCACGTGTCGCC ACG TCAGCAACCGCTTCA 

' ■ 1 ■ 1 ■ ' 1 1 ■ ' ■ 1 ■ • 1 ' 1 ■ ■ ■ i iacc 

TTTTC TAGTCTTCTTTAACCTCGTTGATGGGTGTAGGTAATACGGTGGGCGCCAAAGATTTAATGGGAGCGGTGCACAGCGGTGCAGTCGTTGGCGAAGT 



-eGFP.C.e.unc53 



-C.e. unc53 



K0QKKLEQLPT5 i MPPAVSKLPSPRVATSATAS 

GCAAC TAACCCAAATTCCA ACTTTCC AC AAATGTC AACA TCC AGGC TTCAGACTCCACAG TCAAGAATATCGAAAATTGATTC ATCAAAGATTGGTATCA 
CGTTGAT TGGGTTTAAGGT TGAAAGGTG TTTACAG TTGTAGG TCCG AAGTCTGAGGTGTC AGTTCTTATAGCTTTTAACTAAGTAGTTTCTAACCATAGT 



•eGFP.C e.unc53 



-C.e. unc53 



A T"MPNSNFPQM3rSRLQTPQSRIS!<iOSSK [ G I 

Aat II 

AG:CAAAGACGTCTGGACTrAAAC CACCCTCAT:Ar:AACCACTTCArCAAATAATACAAArTCATrcCGTCCGTCGAGCCGrrCGAGTG5CAATAATAA 
rC3GTrTCrGCAGACCrGAATTTGGTG3GAGTA3'A3TTGGTGAAGTAGTTTATTATGTrTAAGTAAGGCAGGCAGCrCGGCAAGCTCACCGTrATrATr 



eGFP.C, eunc53 — 

~"™ ~~ C.e. unc53 

k:3 ^TSGL<??SS£ r T S S N N T M 5 F R P .5 S 3 3 3 0 M M D 
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Ear I 

EcoR V Ksp632l Asu II 

I i I 

rGTTGGCTCGACGATATCCACATCTGCGAAGAGCTTAGAATCATCATCAACGTACAGCTCUrT TCGAATCTAAACCGACCrACCrCCC^ACTCCAAAAH 

ACAACCGAGCTGCTATAGGTGTAGACGCTTCTCGAATCTTAGTAGTAGTTGCATGTCGAGATAAAGCTTAGATrTGGCTGGArGGAGGGTrGAGGTTTTT "'^ 



-C.e. unc53 



VGST ISTSAKSLESSSTYSSl SNLNRPT5QLQK 

Xba I Nhe I 

I ! 

CCTTCTAGACCACAAACCCAGCTAGTTC3TGTT3CTACAACTACAAAAATCGGAAGCTCAAAGCTAGCCGC TCCGAAAGCCG TGAGCACCCC AAAACTTG 
GGAAGATCTGGTGTTTGGGTCGATCAAGCACAACGATGTTGATGTTTTTAGCCTTCGAGTTTCGATCGGCGAGGCTTrCGGCACTCGTGGGGTTTTGAAC " 



•eG(-h , .u.e.uncS3 



-C.e. unc53 



PSRPQTQLVRV ATTTK IGSSKLAAPKAVSTPk'L 



Bsm I 

CTTCTGTGAAGACTATTGGAGC AAAACAAGAGCCCGArAACAGCGGTGGTGGTGGTGGTGGAATGCTGAAATTAAAGTTATTCAG TAGC AAAAACCCATC 
1 1 1 1 ' 1 » 1 ■ 1 1 ■ 1 1 i 1 , t 

GAAGACACrTCTGATAACCTCGTTTTGTTCTCGGGCTATTGTCGCCACCACCACCACCACCTTACGACTTTAATTTCAArAAGrCATCGTTTTTGGGTAG " * 



-eGFP.C.e unc53 



-C.e. unc53 



A S V K T 1 G_ A K0t?0NSGGGGGGMLKLKLFSSKNP.3 

TTCCTCATCGAATAGCCCACAACCTACGAGAAAGGCGGCGGCGGTGCCTCAACAACAAACTTTGTCGAAAATCGCTGCCCCAG TGAAAAGTGGCCTGAAG 
AA3GAGTAGCTTATCGGGTGTTGGATGC TC TTTCCGCCGCCGCCAC GGaGTTGTTGTTTGAAACAGCTTTTAGCGACGGGGTCACTTTTCACCGGACTTC 



-eGFP.C.e.unc53 



-C.e. uncS3 



3 S S M SPQPTRKAAAVPGQQTLSK !AAPVK*GL> 

: BstX I ; Hind til 

CCG C CGACC AG f A AGC TGGGAAG TGCC ACGTCT a 'G TC 3 AAGC T TTGf AC GCC AA AAG T f TCC TACCGTAAAACGGACGCCCCAAfC A' AT: fCAAC A 
G'JCGGC TGG TCaT TCGACCCT TC ACGG ToCAGATACAGC TTCGAAACATGCGG T7TTCAAAGGATGGCATTTTGCC TGCGGG3 rrAGTA"A3AGTTG"TC 



-eGFP.C.9.unc53 



C.e. unc53 — — — ____ 
3 r3KL GSAT3. M SKLC TPKVSYRK T .0 A a I : -5 0 . 0 
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Ear I 

f<sp632l BspM II 

i i 

ACTCGAAACGATGCTCAAAGAGCAGTGAAGAAGAGTCCGGATACGCTGGATTCAACAGCACGTCGCCAACGTCATCATCGACGGAAGGTTCCCTAAGCA7 

■ i i 1 ' » * ' " ' 1 1 1 1 1 1 ' 1 1 i i \ mrp 

rGAGCrTTGCTACGAGTTTCTCGTCACTTCTTCTCAGGCCTATGCGACCTAAGTTGTCGTGCAGCGGTTGCAGTAGTAGCTGCCTTCCAAGGGATTCGTA " 



eGFP.C.e.unc53 



C.e. unc53 . — 

DSKRCSKSSEEESGYAGFNSTSPTSSSTEGSLSI^ 



Bsm I 
Sph I 
i Ava III 

! Nsi I 

I 1 

GCATTCCACATCTTCCAAGAGTTCAACGTCAGACGAAAAGTCTCCGTCATCAGACGATCTTACTCTTAACGCCTCCATCGTGACAGCTA7CAGACAGCCG 
CGTAAGGTGTAGAAGGTTC TCAAGTTGCAGTCTGCTTTTCAGAGGCAG TAGTCTGCTAGAATGAGAATTGCGGAGGTAGCAC TGTCGATAGTCTGTCGGC 



eGFP,C.e.unc53 



C.e. unc53 — — — — — — — — — — - _ 

HSTSSKSSTSOEKSPSSOOLTLNASIVTAIROP 



Ssp I 

ATAGCCGCAACACCGGTTTCTCCAAATATTA7CAACAAGCCTGTTGAGGAAAAACCAACACTGGCAGTGAAAGGAGTGAAAAGCACAGCGAAAAAAGATC 

, y ■ ) , 1 H~ ' i i 1 . 1 1 i i . h 

TATCGGCGTTGTGGCCAAAGAGGTTTATAATAGTTGTTCGGACAACTCCTrTTTGGTTGTGACCGTCACTTTCCTCACTTTTCGTGTCGCTTTTTTCTAG 



eGFP.C.e.unc53 



C.e. unc53 — — — ^ 

AATPVSPNI tNKPVEEKPTLAVKGVKSTAKICO 



PmaCI 

Pvu II ! PmaCI EcoR V 

111 I 
CAZCTCCAGCTGTTCCGCCACGTGACACCCAGCCAACAATCGGAGTTGTTAGTCCAATTArGGCACATAAGAAGTTGACAAArGACCCCGTGATATCTGA 

G"3GAGGTCGACAAGGCGGTGCACTGTGGGTCGG7T3TTAGCCTCAACAATCAGGTTAATACCGTGTArTCrTCAACTGTTrACTGGGGCACTATAGAC7 



©GFP.C.e.unc53 



— - C.e. unc53 

??PAVPPR07Q?7 IGVVSPiMAHKKLTNOPy ISE 



Alwn I 

I 

AAAACCAGAACC TGAAAAGC7CC AA7CAATGAGCA7C3 ACACGACGGACG 77CCACCGC7TCCACC "C 7AAAA TCAGT7GT7CCACT7AAAATGAC 77CA 
r ":3GTCr7GGACTT7TCGAG3TTAGT7ACTC2"A2:T3TGCTGCCTGCAAGG7GGCGAAGGTGGAGA7TTTAGrCAACAAGGTGAArT7TACTGAAG" 



eGFP.C.e.unc53 



~ ■ ■ — C.e. unc53 

?■ 3 E P E K L C S M 5 10 T 7 0 V P P L P P L K 5 V V P L •: M T 3 
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fig 30 pEGFF72 <1 >9697) Site and Sequence 

Spll 

ir 

ArCCGACAACCACCAACGTACGATGTTCTTCTAAAACAAGGAAAAATCACATCGCCrGTCAAGTCGTTTGGATATGAGCAGTCGTCCGCGrCTGAAGACT 

taggctgttggtggttgcatgctacaagaagattttgttcctttttagtgtagcggacagttcagcaaacctatactcgtcagcaggcgcagacttctgh 3,C '' 



- eGFP.C. e.unc53 



-C.e. unc53 



I RQPPTYDVLLKQGK I fSPVKSFG YEQ SSAS£0 

CCATrGTGGCTCATGCGTCGGCrCAGGTGACTCCGCCGACAAAAACTTCTGGTAATCATTCGCTGGAGAGAAGGATGG GAAAGAATAAGACATCAGAATC 
GGTAACACCGAGTACGCAGCCGAGTCCACTGAGGCGGCTGTTTTTGAAGACCATTAGTAAGCGACCTCTCTTCCTACCCTTTCTTATTCTGTAGTCTTAa 



-eGFP.C.e.unc53 



- C.e. uncS3 



S1VAHASAQVTPPTKTSGNHSLERRMGKNKTSE3 

■ ■ ■ ■ * - > ■ ■ i 

CAGCGGCTACACCTCTGACGCCGGTGTTGCGATGTGCGCCAAAATGAGGGAGAAGCTGAAAGAATACGATGACATGACTCGTCGAGCACAGAACGGCTAT 

1 ' 1 ' ' ' 1 ' ■ I 1 ' » ■ ' ' ' ■ ' ■ I Y"i£t" 

G7CGCCGATGTGGAGACTGCGGCCACAACGCTACACGCGGTTTTACTCCCTCTTCGACTTTCTTATGCTACTGTACTGAGCAGCTCGTGTCTTGCCGATA 



-eGFP.C e.unc53 



-C.e. unc53 



SGYTSDAGVAMCAKMREKLKEY0OMTRRA0N6Y 

_ . . , , i ( 

Asu II ,Sst I BspM II 

i i r 

CC TGACAACTTCGAAGACAGTTCCTCCrTGTCGTCTGGAATATCCGArAACAACGAGCTCGACGACATATCCACGGACGATTTGTCCGGAGTAGACATG^ 
. ■. , „ , , ( , , , j , , , tiii i--.y 

GGACTGTTGAAGCTTCrGTCAAGGAGGAACAGCAGACCTTATAGGCTATTGTTGCTCGAGCTGCTGTATAGGTGCCTGCTAAACAGGCCTCATCTGTACC 



-eGFP.C.e.unc53 



-C.e. unc53 



* 0 N F EC5SSL3SGIS0NNEL00IST0DL3GVOM 

CAACAGTCGCCTCCAAACAfAQCGAC TATTCCCAC TT7GTTCGCCATCCC ACGTCTTCTTCCTCAAAGCCCCGAGTCCCCAG TCGGT CC TCC AC ATCAG * 
G " T<j rCAGCGGAGG TTTGT ATCGCTGATAAGGG'G AAAC AAGCGGTAGGG TGCAGAAGAAGGAGTTTCGGGGC rCAGGGGTCAGCCAGGAGo 7GTAG "CA 



-eGFP.C e.unc53 



C.e. unc53 

A ^VAS<^S0■YSVr VRHPTSSSStCPRVPSRSS T S V 



BNSDOCID: <WO_9824810A2_I_> 
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fig 30 pEGFP72 (1 > 9697) Site and Sequence 

8gi I 



Nar I 

I i r 

jTGGCGCC G 

GCTAAGAGCTAGAGCTCGTCTTGTCCTCTTACACATGTTTGAAGACAGGGTCACGGCrTGCTCGGTTGCACCGCGGCGACGGrGGAGTTGGAAGCCTGT" 



^<ho I j J pbe I 

CGATTCTCGATCTCGAGCAGAACAGGAGAATGTGTACAAACTTCTGTCCCAGTGCCGAACGAGCCAACGTGGCGCCGCTGCCACCTCAACCTTCGGACAA 



-eGFP.C.e.unc53 



-C.e. unc53 



DSRSRAEQENVYKLLSQCRTSGRGAAATSTFGq 

Xma I £pe I 

Sma i Pvu II Sal I 

I ! I I I 

CATTCGC TAAGATCCCCGGGATACTCATCCTATTCTCCACACTTATCAGTGTCAGCTGATAAGGACACAATGTCTATGCACTCACAGACTAGTCGACGAC 

i ■ i . » ' ' ■ ■ ■ ' ' ' ' 1 ■ t ■ h 37C0 

GTAAGCGATTCTAGGGGCCCTATGAGTAGGATAAGAGGTGTGAATAGTCACAGTCGACTATTCC TGTGTTACAGATACGTGAGTG TCTGATCAGCTGCT3 



-eGFP.C.e.uncS3 



-C.e. unc53 



HSLRSPGYSSYSPHLSVSADKOTMSHHSQTSRR 
CTTCTTCACAAAAACCAAGCTATTCAGGCCAATTTCATTCACTTGATCGTAAATGCCACCTTCAAGAGTTCACATCCACCGAGCACAGAATGGCGGCTCT 

, 1 ■ i ■ 1 ■ " ■ 1 ■ 1 • ' < » ■ ' ' ■ yaw 

GAAGAAGTGTTTTTGGTTCGATAAGTCCGGTTAAAGTAAGTGAACTAGCATTTACGGTGGAAGTTCTCAAGTGTAGGTGGCTCGTGTCTTACCGCCGAGA 



-eGFP.C.e.unc53 



-C.e. unc53 



pssokpsysgqfhsldrkchloeftstehrmaal 

Bam HI 

I 

C "TGAGCCCGAGACGGGTGCCGAAC TCGATG7CGAAA7A TGATTCT TC AGGATCCTACTCGGCGCGTTCCCGAGGTGGAAGC TCTAC TGGTA7C TATGGA 
G.^aCTCGGGC TCTGCCCACGGCTTGAGC TACAGCTTTATACTAAGAAGTCCTAGGArGAGCCGCGCAAGGGCTCCACCTTCGAGATGACCATAGATACC- 



-eGFP.C.6.unc53 



- C.e. unc53 



L SPRRVPNSM3K YDSSGSYSARSRGGSSTG I Y G 

0am HI Nhe I Nde i 

I : i 

GAG ACGT TCCAACTGCACAGACTATC CG ATGAAAAAT" CCCGC AC A r 7C TGCCAAAAGTGAGATGGGA7CCCAAC TA TC AC T3 5C7AGCACGACA3C A~ 
C7: 7GCAAGGT7GACG7GTC7GA7AGGC7AC7 7:7 7AGGGGGCGTG7AAGACGGTTT7CAC7C7ACCC7AGGGr7GA7AG7GACC3A7CGTGCTGTCGTA 



eGFP.C e unc53 



" C.e. unc53 

rrCtHPLSDE<SPAHSAKS£MGSQL3LA3:TA 
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# 

WO 98/24810 
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fig 30 pEGFP72 (1 > 9697) Site and Sequence 

PmaCI 

] PmaCI jSal I 

A7GGA7C TCTCAATGAGAAGTACGAACA7GCTA77CGGGACATG GCACG7GACTTGCAGTGTTACAAGAACAC TGTCGAC TCACTAACCAAGAAACAGGA 
TACCrAGAGAGrTACTCTTCATGCTTGTACGATAAGCCCTGTACCGTGCACTGAACCTCACAATGTTCrTGrGACAGCTGAGTGATTGGTTCTTTGTCC' "'^ 



-C.e. unc53 



YGSLNEKYEHAIROMAROLECYKNTVOSLT 



K K 0 E 



Ear I 

Hind III pa I Ksp632l 

GAACTATGGAGCATTGTTTGATCTTTTTGAGCAAAAGCTTAG 

CTTGATACCTCGTAACAAACTAGAAAAACTCGTTTTCGAATCTTTTGAGTGAGTTGTGTAAC TAGC TAGGTTGAACTTCGGAC TTCTCCGT7ATGCTAA3 



- eGFP.C.e.unc53 



- Co. unc53 



NYGALFOLFEQKLRKLTQHIORSNLKPEEAIRF 

■*- ■ 1 • ■ * ..... . . - . - ■ . 1 1. ,1 * - - - i ■ - ■ 1 . , 

AGGCAGGACATTGCTCATTTGAGGGATATTAGCAATCATCTTGCArcCAACTCAGCTCATGCTAACGAAGGCGCT GGTGAGCTTC TTCG TCAACCA TCTC 
TCCGTCCTGTAACGAGTAAACTCCCTATAATCGTTAGTAGAACGTAGGTTGAGTCGAGTACGATTGCTTCCGCGACCACTCGAAGAAGCAGTTGGTAGA3 



-eGFP.C.e.unc53 



-C.e. unc53 



q Q C I AH LRD I SNHLASNSAHANEGAGELLRQP3 

Ear I 

pta I : CIa I Sst I Ksp632l 

' * i i 

7GGAATC AGTTGCATCCCATCGATCA TCGATGTC ATCGTCGTCGAAAAGCAGCAAGCAGGAGAAGATCAGCTTGAGCTCGTTTGGCAAGAACAAGAAGAG 
AC:rTAGTCAACGTAGGGTAGC7AGTAGCTACAGTA3CAGCAGCTTTTCGTCGTTCGTCCTCTTCTAGTCSAACTCGAGCAAACCGTTCTTGTTCTTCT: 



-e3FP,C.e.unc53 



-C.e. uncS3 



IE SVASH RSS MSSS5 :< S S K 0 E K I 3 L S 5 F G K N '-. K 5 

P am H1 Nde I BspM II 

C '3GATCCGC TCC TC AC TC TCCAAG 7'C AC C AACAA3 AAGAACAAGAAC T ACGACGAAGC AC AT A TGC CATC A AT T TCCGGAfCTCAAGGAAC TCT TGAI 
G AC C TAG GC G AGGAG TG AG AGG 7 TC A AG TGGTTCTTC TTC T TG7TC TTGA TGC TGC T TCG 7G7ATACGG7A3TTAAAGGCCTAGAGTTCCT'7GA3AACT*J 



■ eGFP.C e uncS3 



— C.e. unc53 

3SLSK r TK<fc:NKNY0EAHM55 I 3 G S 0 'J T I D 
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fig 30 pEGFF72 (1 > 9697) Site and Sequence 



Page^ 



Sst I ApaL I 

AACATTGATGTGATTG AGT TGAAGCAAGAGCTCAAAGAACGCGATAGT GCACTTTACGAAGTCCGCCTTGACAATCTGGATC3"3CCCGC 3mmGTTGATI: 
T7GTAAC rACACTAACTCAACTTCGTTCTCGAGTTTCTTGCGCTATCACGTGAAATGCTTCAGGCGGAACTGTTAGACCTAGCACGGGCG;TTC4ACTA: 



-eGFP.C e.unc53 



-Co. unc53 



N IOV tELKOELKEROSALYEVRLONLORAREVD 

ttctgagggagacagtgaacaagttgaaaaccgagaacaagcaattaaagaaagaagtggacaaactcaccaacggtccagccactcgtg:ttcttccc2 
a ag ac tccctctgtcacttgttcaacttttggctcttgttcgttaatttctttcttcacctg tttgagtggttgccaggtcggtgagcacgaagaaggg: 



-eGFP.C.e.unc53 



-C.e. unc53 



VLRETVNKLKTENKOLKKEVOKLTNGPATRASSP 

- ■ » ■ • 1 ■ 1 

Kspl psrl |Asu II 

cgcctcaattccagttatc tacgacgatgagcatgtcta tgatgcagcgtgtagcagtacatcagctag tcaatcttcgaaacgatcctctggc tgcaac 

gcggagttaagg TCAATAGATGC TGCTACTCGTACAGATACTACGTCGCACATCGTCATGTAGTCGATCAGTTAGAAGCTTTGCTAGGAGACCGACGTT3 



-eGFP.C.e unc53 



-C.e. unc53 



AStPVlYOOEHVYOAACSSTSASQSSKRSS 



C M 



pvu I 

I Hpa I EcoR V 

! ! ! 

TCAATCAAGG TTAC TGTAAACGTGGAC ATCGCT3GAG AAATC AGTTCGATCGTTAACCCGGACAAAGAGATAATCG TAGGATA7CTTGCC A'GTCAACCh 
AGT TAG T TCC AATGAC ATT TGCACCTGTAGCGACCTC TTTAGTCAAGCTAGCAATTGGGCCTGTTTCTCTATTAGCATCCTATaGAACGGThCAGTTGG" 



-eGFP.C.e.unc53 



-C.e. unc53 



S IKVTVNVC1AGE ISS IVNPOKc I I V G V L A K S T 

: Cla f 

G TC AG TC A7GC7GG AAAGACAT TGATG T TTC7A7 TCT A3GAC TATTTGAAG TC TACC TATCC AG AA 7 TG A TGTGGAGC A TC A AC T TGGAAT ZGA 7GCTCC 
CAGTCAGTACGACCTTTCTG7AACTACAAAGATAAGAT:CTGATAAAC TTC AGATGGATAGG TCT7AACTACACCTCGTAGTT3AACCT7A3CT ACGAGC 



-eGFP.C e unc53 



S C V K 0 ( 0 V S ; 



C.e. uncS3 — — — 

LFEVYLSR IDVEHQLGIOAF 
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fig 30 p£GFF72 {1 >9697) Site and Sequence 



Page 1 ° 



u 1 



rG ArrCTATCCrTGGCrATCAAATTGGTGAACTTCGACGCGTCATT GGAGACTCCACAACCATGATAACCAGCCATCCAACrGACATTCTTACrTCCrCM 
ACTAAGArAGGAACCGATAGTTTAACCACTTGAAGCTGCGCAGTAACCTCTGAGGTGrTGGTACTATTGGrCGGTAGGTTGACTG TAAGAATGAAGGAG" 



5ta 



-C.e. unc53 



OS I LGYQ ( G E L R R V (GDSTTM I TSHPTO I L T S . > 



actacaatccgaatgttcatgcacggtgccgcacagagtcgcgtagacagtctggtccttgatatgcttcttccaaagcaaatgattctccaactcgtch 
tgatgttaggcttacaagtacgtgccacggcgtgtctcagcgcatctgtcagaccaggaactatacgaagaaggtttcgtttactaagaggttgagcag" 



-eGFP.C,e.unc53 



-C.e. unc53 



TTIRMFMHGAAOSRVOSLVLDMLtPKQMILQ 



L V 



A.atll fisrl jBsrl ^ su , ( 

AGrCAATTTTGACAGAGAGACGrCTGGTGTTAGCTGGAGCA ACTGGAATTGGAAAGAGCAAACTGGCGAAGACCCTGGCTGCrTATGTATCTATTCGAAC 
TCAGTTAAAACTGTCTCTCTGCAGACCACAATCGACCTCGTTGACCTTAACCTTTCTCGTTTGACCGCTTCTGGGACCGACGAATACATAGATAAGCTTj 



-eGFP.C e unc53 



-C.e. unc53 



KSILTERRLVLAGATGIGKSKLAKTLAAYVSIRT 



Ssm I 



Bgi ii 



"AATC AATCCGAAGATAGTATTG TTAA TATCAGCATTCC TGAAAACAATAAAGAAGAATTGC TTCAAG7GGAACGACGCC TGG AAAAGATC TTGAGAAGC 
7 7TAGTTAGGCT 7CTATCATAACAA TT AT AG~CQTAAG5 AC TTTTG TTATTTCTTCTTAACGAAGTTCACCTTGC TGCGGACCTTTTC7AGAAC TCTTCc 



- eGFP.C . s.unc53 



-C.e. unc53 



N Q3ED5lVNI3lPENNKEELLQVERRlEKI 



L R 



Ava lit 

Nsi I ; Xba I 

* a AG A AT C A TGC A TCG T AA T7C TA3 A7AAT A7QCC AAA3 AA TC GAA TTGC A TTTGTTG TATCCGT 7 TT 7GC AAA7G TCCC AC f 7C AAAACAAlU AA GG JZ 
" 7TC TTAGTACG'AGC ATTAAGA7C 7 A 7 TA7AGGG77 7C TT AGC TTAACG T AAACAAC ATAGGC AAAAAC3 TT TAC AGGGTGAAG fT TTGT TGC 7TCCA5 



• eGFP.C. e.unc53 



E S C I y f l C N 



C.e. unc53 — 

MR I A F V V S V r A N V P L 0 H fl E G 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 1 g7 PCT/EP97/06956 



Tuesday. 18 November 1997 10:35 

fig 30 pEGFP72 (1 > 9697) Site and Sequence 



'Page tf 



coR V 



P 

CATTTGTAGTArGCACAGTCAACCGATATCAAATCCCTG AGC TTCAAATTCACCACAATTTC AAAATGTCACTAATGTCCAA TCGTC TCGAmGGATTCA" 
GTAAACArCATACGrGrCAGTTGGCTATAGTTTAGGGACrCGAAGTTTAAGTGGTGTTAAAGTTTrACAGrCArTACAGCTTAGCAGAGCTn 



CCTAAGTm 



-eGFP.C.e.unc53 



-C.e. unc53 



PPVVCTVNRVOIPELOIMHN 



F * » S V M S N R L £ G F I 



Ear I 
Sst I Ksp632l 

CCTACGTTACCTCCGACGACGGGCGG rAGAGGATGAGTATCGTCTAAC TGTACAGATGCCATCAGAGCTCTTCAAAAfCATTGAC TTCTT CCCAATAGCT ' 
GGATGCAATGGAGGCTGCTGCCCGCCATCT^ 57C-: 



" C.e. unc53 

LRYLRftPAVEOEYRLTV 



QMPSELFK I I 0 F F P I 



Ear I 
Ksp632l 



EcoR I 



Sph I 



Bam Ht 



C 'TCAGGCCGTCAATAATTrTATTGAGAAAACGAATTCTGTTGATG TGACAGTTGGTCCAAGAGCATGCTTGAAC TGTCCTCTAACTGTCGATG GATCCC 
GAAGTCCGGCAGT TATTAAAATAACTCTTTTGCTTAAGACAACTACAC TG TCAACCAGGTTC TCGtACGAAC TTGACAGGAGArTGACAGC TACCTAGGS ^ 

-eGFP.C.e.uncS3 




-C.e. unc53 



lqavnnf iekt nsvovtvgpraclncplt 



V D G S 



gtgaatggttcattcgattgtggaatgagaacttcattccata^ 

CAC ^^ a CCAAG~AAGCTAACACC7TACTCTTGAAGTAAGGTATAAACCTTGCACAACGATC TCTACCGTTTTTTTGGAAGCCAGCGACGTGAAGGAAi"rT 



-Co. unc53 



R E V F ! » L V ,N E N F 



:p YLERVARDGKKTFGR 



C T S F E 



Bam Hi 



Jth I 



Jth I 



'jGATCCCACCGA CATCG "C TCTAAAAAA FGGCCGTGc ^rCGATGGTGAAAACCCGGAGAATGrGC "CAAACGTC TTCAAC TCCAAGACC TCoT CCGTCA 
'^GGGTCCCTG UGCAGAGATT^ 




C P 



■ C.e. unc53 — — - 

0 1 V S * K wf, vF DGENPENVL<RLOLQO 



BNSDOCID: <WO 9824810A2_I_> 
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tig 30 pEGFP72 (1 >9697) Site and Sequence 

BspM I Xho I Sph I 

I I I 

CCTGCCAACTCATCCCGACAACACTTCAATCCCCTCGAGTCGTTGATCCAATTGCATGCTACCAAGCArCAGACCATCGACAACATTTGAACAGAAGACr 

, , I'!' ■ 1 ■ 1 ' ' 1 1 ■ i ~» k oi y. 

GGACGGTTGAGTAGGGCTGTTGTGAAGTTAGGGGAGCTCAGCAACTAGGTTAACGTACGATGGTTCGTAGTCTGGTAGCTGTTGTAAACTTGTCTTCTGA 



-eGFP.C.e.uncSS 



> . 

-C.e. unc53 ' 



PANS5RQHFNPLESL I Q L H A T K H Q T I D N. ( . T E p 

Asp 718 

i r* 

CTAATCTTCTCTCGCCTCTCCCCCGCTTTCCTTATCTTCGTACCGGTACCTGA TGATTCCCCATTTTCCCCCTTTTCCCC CCAATTTCCCAGAACCTCC7 

» i ■ i 1 i i ' i i ■ i i i i i i i i i t i 52 C»" 
GATTAGAAGAGAGCGGAGAGGGGGCGAAAGGAATAGAAGCATGGCCATGGACTACTAAGGGGTAAAAGGGGGAAAAGGGGGGTTAAAGGGTCTTGGAGGA 

SNLLSPLPRFPYLRTGT. . FP CFPLFPP ISQNLL 
Xma I 

J jSma I pra I jlmn I 

GTTCCCTTTGTTCCTAGTCCTCCCGGGTGCCGACGCCGAAGCGATTTAAAAACCTTTTTCTTTCCGAAACATTTCCCATTGCTCATTAATAGTCAAATTG 

, 1 — . — 1 ' 1 ' ' 1 1 1 1 . ■ i 5jc<: 

CAAGGGAAACAAGGATCAGGAGGGCCCACGGCTGCGGCTTCGCTAAATTTTTGGAAAAAGAAAGGCTTTGTAAAGGGTAACGAGTAATTATCAGTTTAAC 
FPLFLVLPGAOAEAI .KPFSFRN ISHC SLI V K L 

Xma I 
! Sma t 

^ho I ]!j ,Bam HI Xba I .Bel I 

I Hi i i I 

AATAAACAGTGTATGTACTTAAAAAAAAAAAAAAAAAAACTCGAGGGGGGGCCCGGGATCCACCGGATCTAGATAACTGATCATAATCAGCCATACCACA 

, , i i i ' i i 1 ' i ■ h s±c<: 

TTATTTG TCACATACATGAATTTTTTTTTTTTTTTrTTTGAGCTCCCCCCCGGGCCCTAGGTGGCCTAGATCTATTGACTAGTATTAGTCGGTATGGTGT 
NKOCMYLKKKKKKLEGGPGIHRl ITDHNGPYH 

Pra I 3sm I Hpa I 

i \ \ 

TTTGrAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGC AATTG TTGTTG TTAAC TTGTTTATTGCAS 
AAACArCTCCAAAATGAACoAAATTTTTTGGAGSGTGTGGAGGGGGACTTGGACTTTGTATTTTACTTACGTTAACAACAACAATTGAACAAATAACGT; 
ICRGFTCFKKPP TPPPEPET. NECNCCC . L V Y C > 

Bsm I 

! 

C TTaTAATGG TTACAAA7AAAGC AATAGCATCACAAA TT rc ACAAATAAAGCATTTTTTTCACTCC ATTCTAG TTGTGGTTTG TCCA AACTC ATCAATOT 
GAATATTACCAATGTTrATTTCGTTATCGTAGTGTrTAAAGTGTTTATTTCGTAAAAAAAGTGACGTAAGArCAACACCAAAC AGGTTTGAGTAGTTACA 
L VLQIKO.HHKFHK . S I F F T A F L V F V Q T H Q C 

Mlu I Ssp I 

] i 

ATCTTAACGCGTAAATTGTAAGCGrTAATATTTTGTTAAAATTCGCGrrAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAAT CGGCAA 

— i i ■ - ...I i ■ . i. i i i i i i ■ ■ a . ii « ■ ■ ■ ■ ■ -i i t i i r i j ■ i .i t i i •r ; 

TAGAArTGCGCATTTAACATTCGCAATTATAAAACAATTrTAAGCGCAATrrAAAAACAArTTAGTCGAGrAAAAAArTGGTTATCCGGCTTTAGCCGTT 

ILTRKL.ALIFC.NSR !FV<SAHFLTNRPKSA 



BNSDOCID: <WO 9824810A2_I_> 



WO 98/24810 

16 g PCTYEP97/06956 
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fig30pEGFP72 (1 >9697) Site and Sequence 



Page 1 J 



Bsrl 



AATCCCT 



TATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGrTCCAGTTTGnAArAAr.Ar. 



TCC AC TA TTAAAGAACG TGGAC 7CCAACGTCAAA 



TTAGGGAATATTTAGTTTTCTTATCTGGCTCTATCCCAACTCACAACAAGGTCAAACCTTGTTC TCAGGTGATAATTTCTTGCACCTGA"6i 



K S L INOKNRPR 



VLFQFGTRVH 



STTGCAGTTT 
* T V T P T s , 




G £ * p S IRAMAHYVNHH 



PNQVFWGRG 



ATTTCGTGATTTAGCCTTGu 
A V K h" . I G T 



Vae I 



CTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCC GGCGAACGTGGCGAGAAAGGAAGCGAACAAAGCGAAA GGAGCGGGCnrr^fififtrrtrT^^.- 
LKG APQLEt DGESRRTWRERKGW 



r 



spl 



Kspl 



TTCACATCGCCAGf GCGACGCGrATTRft T/trtTftTm^r-i-r**-^**^ » 1 ' _ » 



Q V 



CATCGCCAGfGCGACGCGCATTGGTGGTGTGGGCGGl 
RSRCA pp HP 



CGCGAATTACGCGGCGATGTCCCGCGCAGTCCACCGTGAi 



AAAGCCCCT7TACACGCGCC7 



7 1 C'." 




PLFVyFSK 



Y f 0 I C 1 R S . 0 N N 



TATAACTT7TTCCTTCTCA3 
P 0 K C F N N I E K G R 



QxaN I 



Pvu It 



iSph I 
• Ava lit 



AGTATGCAAAGCArGCATCTCAAT 



*" *^^^^^^^^^"^^ A ^* A ^^^^^ A ^^^ A ^^^ A ^ 7CCCACACC 7TTCAGGGG7CCGAGGGG7CGTCCG 7CTTCATACG7TrCG"ACG7AGAl 
' ^^VECVSVR V W K V P R L p S R Q , y , , „ a c 



I aar CAGCAACCAGGTGTGSAAAGTCCCCAGGC7CC = CA6CAGGrAr. A Ar:T 



.Sph I 
Ava III 
Nsi I 



L 7 S N 0 



V W x V P R L P S R Q K 



irACGT7TC37ACGTAGAGT7AA7CAGTCG7TGGTATCAG3GCGGGGA7- ^ 



Bsrl 



Nco I 



Bgl I 

Sfi I 



£I£L^^ _ 

j*-o»ji_G»ju i A'jGGCGGoo-TTGAGjCG35TCAA3GCGo3 7AAGAGGCGGG5 TACCGAC TGAT fA, 



' A H P A P 5 AOFRp FsA p 



iAAAAAAfAAATACGTC rcCGGCTCC33C23AGCC 3 
_ V L r W F P Y L C * G R 3 L G 
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fig 30 pEGFP72 (1 >9697) Site and Sei^nce 

Stu I 

j ^vr II pla I 

CTCTGAGCTATTCCAGAAGTAGTGAGGAGGCT TTTTTGGAGGCCTAGGCTm ^ _ 

GAGACTCGATAAGGTCTTCATCACTCCTCCGAAAAAACCTCCGGATCCGAAAACGTTTCTAGCTAGTTCTCTGTCCTACTCCTAGCAAAGCGTACTiAC: 
L AtPEVVRRLFWRP RLLQR SIK RQOED RF A.L 

0spM I Xma III 

ACAAGATGGATTGCACGCAGGT TCTCCG6CCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCG6CTGC tc tgatgccgccgtg 
TGTTCTACCTAACGTGCGTCCAAGAGGCCGGCGAACCCACCTCTCCGATAAGCCGATACTGACCCGTGTTGTCTGTTAGCCGACGAGACTACGGCGGCAC 
NKMOCTOVL R PLGVRGY SAM TG HNR QS AAL MPPC 

i Nar Bbel MP" 

i i 1 

TTCCGGC TGTCAGCGCAGGGG CGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCGGTGCCCTGAATGAACTGCAAGACGA GGCAGCGCGGCTATCGTGGC ^ 
AAGGCCGACAGTCGCGTCCCCGCGGGCCAAGAAAAACAGTTCTGGCTGGACAGGCCACGGGACTTACTTGACGTTCTGCTCCGTCGCGCCGATAGCACCG 
SGCQRRGA RFFLSRPTCPVP. H N C K T R 0 R G Y R G 



Fspl 

0al I j Pvu II jTth I 

TGGCC AC GACG6GCGT TCC TTGCGC AGC TGTGCTCGACGTTGTCAC TGAAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGA7CTCCT 
ACCGGTGCTGCCCGCAAGGAACGCGTCGACACGAGCTGCAACAGTGACTTCGCCCTTCCCTGACCGACGATAACCC6CTTCACGGCCCCGTCCTAGAGGA 
VP R R A F L AQL CST LSLKREG TGC YVAKCRG R IS 

pspM I 

i 

GTCATCTCACCTTGCTCCTGCCGA GAAAGTATCCATCATGGC^^ 

CAGTAGAGTGGAACGAGGACGGCTCTTTCATAG3TAGTACCGACTACGTTACGCCGCCGACGTATGCGAACTAGGCCGATGGACGGGTAAGCTGGTGGT7 
CHLTULLPR<Y?S VLM0 CGG CIRL1RL PAH STT> 

Ear I 
Ksp632l 

i 

GC3AAACATCGCATCGAGC3AGCACG7ACTCGGArG3AAGCCGGTCTTGTC6ATCAGGATGArCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCoAA: ^ _ 
CGCTTTGTAGCGTAGCTCGCTCGTGCATGAGCCTACCTTCGGCCAGAACAGCTAGTCCTACTAGACCTGCTTCTCGTAGTCCCCGAGCGCGGrcSCCTTG 
RNIAS5EHVLGVKPVL S I R M I W T K S \ R G S R Q P » 

.Sph I Nco I 

■ i 

:gttcgccaggctcaaggc3agcat3'cccgacg:cgaggatc tcgtcgtgacc'catggcgatgcctgcttgccgaatatcatggtggaaaatggccgctt 

ACAAGCGGTCCGAGTTCCGCTCGTACGGGC TGCCCICTCC TAGAGCAGCACTGGGTACCGCTACGGACGAACGGCTTA7AGTACCACCTTTTACCGGCGAA 
"CSPGSRRACPTAR ISS .PMAMPACRIS^VK M A A 



Ear I 

Nae I Rsr II ; Ksp632l 

1 I 1 

"TrTGGATTCArCGACTGTSGCCGGCTGGGTGT^GCGGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCT TGGCGGC^AA.'j^ 

AA3ACCTAAGTAGCTGACACCGGCC3ACCCACA::GC;T3GC3ATAGTCC rGTATC3CAACCGATGGGCACTATAACGACTTCTCGAACCGCCG:TrAC: 
F L 0 3 S T V A G V 7 - ■ * X A I R T . R V L P V t I I < 3 L A A fl G 
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TGG3 



GCTGACCGCTTCCTCGTGCTTTACGGTA TCGC CGCTCCCGATrCGCAGCGCATCGCCTTC TATCGCC TTCTTGACOAG TTCTrC TGAGCGGGACTr 
CGACrGGCGAAGGAGCACGAAATGCCATA^CGGCGAGGGC^ 3 ,;V 
1 T *■ 5 3 C / T V 3 P L P f R S ASPS I AFL TSS 



s e r o s 



Asu II 

: 
: 

GTTCG 



BspM t 



AAATGACCGACCAAGCGACGCCCAACCTGCCA TCACGAGATTTCGATTCCACCGCCGCCTTCTATGAA AGGTTGGGCTTCGGAATrrTTT-r^^.^.^ 

CAAGCrTrACTGGCrGGTTCGCTGCGGGTTGGACGGTAGTGCTCTAAAGCTAAGGTGGCGGCGGAAGATACrTTCCAACCCGAAGCCTTiGcTA^S "* 
VRN ORPSDAQPa I TRFRFHRRLL . k v g l r n r f 



f ao ' | Ks P' Avrl. 

GACGCCGGCTGGATGATCCTCCAGCGCGGGGATCTCA TGCTGGAGTTCTTCGCCCACCCTAGGGGGAGGCTAACTGAAACA CGGAAGGACArAATArrrc 

CrGCGGCCGACCTACTAGGAGGTCGCGCCCCTAGAGTACGACCTCAAGAAGCGGGTGGGATCCCCCTCCGATTGACTTTGTG CclTCCTCTGTTATGGC- 7 *** 
G » R L 0 0 P P A P G S H A G V L R p p , G £ A N . N T £ G p ^ t U g 

|<spl 

AAGGAACCCGCGCTATGACGGCAATAAAAAGACAGA ATAAAACGCACGGTGTTGGGTCGTTTGTTCATAAACGCGGGGTTC GGTCCCAGGCrTKfir arT.- 

TTCCTTGGGCGCGATACTGCCGTTATTTTTCTGTCTTATTTTGCGTGCCACAACCCAGC^ACAAGTATlTGCGCCCCAiGCCAGGGTCCCGACCGTGAG ^ 
NP R Y O GNKICTE . NARCW Vvc S . TRGSVPGLAL 

TGTCGATACCCCACCGAGACCCCATTGGGGCCAATAC GCCCGCGTTTCTTCCTTTTCCCCACCCCACCCCCCAAGTTCGGG TGAAGGCCCAGfiflrTrftrA 

ACAGCTArGGGGTGGCTCTGGGGTAACCCCGGTTATGCGGGCGCAAAGAAGGAAAAGGGGTGGGGTGGGGGGTTCAAGCCCACTTCCGGGTCCCGAGCG; ^ 
CRYPTETPLGPlRppFf^pp T ? P p < p (3 RPRAR 



f ,Wn ' i° xaN » pal p«, 

GCCAACGrCGGGGCGGCAGGCCCTGCCATAGCcVcAGG TTACrCATArATACTTTAGATrGArTTAAAACTTCATTTTTAA TTTAAAAGGArCrAGGr,, 
^TrGCAGCCCCGCCGTCCGGGACGGrATCGGAGTCCAATGAGTArATATGAAATCTAACTAAATTTTGAAGrAAAAATTAAATTTTCCTAGATCCA^ " K 
3QRRGGRPCHSt,RL L I YTLD . F K T S F L 1 • K 0 L G 

BspH 1 

A^rccTrrTTGArAATCTLTGACCAAAATCCCTTAACGTGAGTrrTCGrrCCACTGAGCGrCAGACCCCGTAGAAAAGArCAAAGGATCrTCT,,,, 



TCrAGGAAAAACTATrAGAGTACTGGTTrTAGGGAArTGCACrCAAAAGCAAGGTGACTCGCAGTCrGGGGCATCTTrTCTAGTTrCCTAGAAGAAcS 
D P F • S HOONPL T . VF VPLSVRPRBK OQR I FL R 

rC:rTrTrTTC7GCGCGTAATCTGCrGCrTGC AAACAAAAAAACCACCGCrACCAGCGGTGGTTTG7TTGCCG GATCAAGAGCTACrAACr C TTTTrr.--. 

A„AAAAAAAGAC 3 CGCArrAGACGACGAACGTTTGrTTrTTTGGTGGCGArGGrCG CCACCAAACAAACGGCCTAG T TCTCGArGGTTGAGAAAAAGGC r 
- r ' 5 4 " N 1 L L A » ' * T T A T S G G t f A G 3 R A T N S F S 

Bsrt 

ffi^TGGc'TrCAGCAGAGCGCAGArACCA AA-ACIGTCCrTCTAGTGrAGCCGTAGrTAGGCCACCACTTCAAGAACTC rGrAGCAC^CCrACA^ 
-^rTGACCGAAGrCGTCICGCGTCrATGGTTTArGACAGGAAGArCACATCGGCArCAArCCGGTGGTGAAGTTCTTGAGACATCGTGGCGGATGTA 

5 --■ L ° ° S A 0 T < ' : p » s v a v v « p p u 0 e L c 3 r a y , 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 m PCT/EP97/06956 



Tuesday, 16 November 1997 10:35 Page it 

tig 30 pEGFP72 (1 > 9697) Site and Sequence 

AJwn t 

ACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCA 

i t i 1 ' i 1 ■ 1 ' ' 1 : 1 h ^yy 

TGGAGCGAGACGATTAGGACAATGGTCACCGACGACGGTCACCGCTATTCAGCACAGAATGGCCCAACCTGAGTTCTGCTATCAATGGCCTATTCCGCG" 
PRSANPVTSGCCQVR VVSYRVGLKT IVTG G A 

j*pall 

GCGGTCGGGCTGAACGGGGGGTTCGTGC AC AC AGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCC 

' 1 ■ » 1 ' 1 1 ' ' 1 ► 3»CC 

CGCCAGCCCGAC TTGCCCCCCAAGCACGTGTGTCGG5TCGAACCTCGCTTGCTGGATGTGGCTTGACTC TATGGATGTCGCACTCGATACTCTTTCGCGG 

AVGLNGGFVHTAQLGANOLHRTE IPTA A M R K R 

■ i ■ . _ ^ . . 

ACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATC 

1 | i I lil ■ \ i I ■ I ' I ' I ' i ' I t£C'-* 

TGCGAAGGGCTTCCCTCTTTCCGCCTGTCCATAGGCCATTCGCCGTCCCAGCCTTGTCCTCTCGCGTGC TCCCTCGAAGG TCCCCCTTTGCGGACCATA3 

HASRREKGGOVSGKROGRNRRAHEGASRGKRLVS 

I I - ' - ■ ■ 1. ■ 1 . I . -.1 ■ ■ . ■ • 

TTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGC 

—4 i i i i 1 1 • ' — ' ' 1 ' 

AAATATCAGGACAGCCCAAAGCGGTGGAGACTGAACTCGCAGCTAAAAACACTACGAGCAGTCCCCCCGCCTCGGATACCTTTTTGCGGTCGTTGCGCCo 

L SCRVSPPLT AS IFVMLVRGAEPMEKRQQRG 

Ava III 
Nsi I 

CTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTSCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCATGCAT 

i i i i i i i - i - i ■ i — | i i i * 96 S 7 

GAAAAATGCCAAGGACCGGAAAACGACCGGAAAACGA3TGTACAAGAAAGGACGCAATAGGGGACTAAGACACCTATTGGCATAATGGCGGTACGTA 

LFTVPGLLLAFCSHVLSCVIP FCG - PYYRHA 
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FTTi G4GTT f ACGAC rArAAAA4TfiTi:TT<~TTTA »t* aot at^t^-^-^E- 1 .'!!! ^* ft> AG TT fAA H|< 



<*20 



iTAAAA rGCCCAfiA^fi"^^ T'AAAC." f PG r Hff) 

™TGACGGCTrATA7C7CAATTTTC7^A3TTT^^ fi6 ° 
IGCACAAAl ' ' ' ' ^AG f I'GACA.^C TTTcf A ATA /nt) 

« i i-CCSCC rCCCC rrQCrrciSAAG^ *rrj-f 111^? ?ItJt^ iZZZZ J£ : AAA AT TTTGCiATTTAC 9 10 
,,,,,,, ' C l 6 f. . . , . , l ' 0 , 7C ' 0e ° 1Cs ^ »>00 I M0 "ifio" 

- ...T.T C ,A. .C.UCULTCAT C iGoG : TACCAfAAAC T-3G AA "ACA I :" I' I'AC i'ACTA T ; 'CAAGCC 1«y 
, i , 'V 0 , 1U , 20 l^'O 1100 IflflO 1M7-. 

. 17 , 7Q . 7°° . "*> '** i™'" 

^ ?r^~*S«^oSSs5:f£L ^^=^S^S^i3J£^J? ^^5JS?Sg55SS^?£SS^^Si !SS? 

,J ' k^CTGCCTGAAAi-CCGGCsAAaATTTAGT ATATTTA' F GAGC I" l ATCT TTATGCAATAC A rA ?!C0 
' ' " ' ' 1 1 ■■•■»•■■ ' . I ...... . ', J 



: t2ii?S2c^ 1 I'Gfl AA A AG A AGAA AAAA* 2 ! /0 

rJi-S^i "r, -r?-:,- ' T G " ,,C:A rAA(?C ^AAGCCCAAGCASAAArGAC F rcCA i I I a hi® 
I -AoW. I A.jAT GACT . CC ■ TG'. -AG 1 C T A ATC C AG AC "AGA T"T TCC A AGA <; A( : ! : , | C A A I I I V 
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2460 



2170 



2>\ao 



2 '190 



2600 



2510 



AfAS: r I fCAATTACAGATTC ATCAAAGAT - uGTA , LAAUCL AAAoAg TC(JAGTGGC aatmtaM 2^0 
•• TATCAACCACTTC ATI". AAA fAA i Ai.AAA ^^^pSTS^pg^j^p jjrf g^CTTCTTTTTAGAAAT T '? /30 

^ttccHtccaccatatccacatctgcgaagagct^ ,nO0 

•XT ATTATTTCAGAATCATCATC AAU, PACAU. T 2fi60 9 . fl7 0 
2610 2820 2630 20 , W t ....... .a. 

aaaacc r rc i agaccacaaacccagctaoT , cgtgt !|SCt acaa. . r(JfiAGC aaaac aagagcccg jgnc 

XCGCTCCGAAAGCCCTGAGCACCCCAAAACT^ 

ATAACA3C66TGGreaTGGT66TGSAA:« : .QAAATTAA^ ^WO 

-^CGAATAGCCCACAACC "^ jlft) 
— ^^ArtTr.AAAAr,TGGCCTGAAGCCC-i.i.oACCAiii«««v,i» ^ ^ >wvf . ( 

"•' """"^SO 3.70 3180 31 80 J ..Vi. . 

iii i i . - 1 ■ ■ ■ i .I . ■ ■ . i -fi ll Tr - ^'"^"'^ 16 ^ 

.ITACGGCAAAAGTTTCf.TACCGTAAAACSGACG^ 3290 

*,<\AGAGCA6 TGAAGAAGASTCCGGATACGs. ' ^^1^1^ ^^g^ q \Q/\c-,3A.AAA6TCTCCG T f.ATCAGArG jAO 

V-TTCCCTAAGnAVGCATrCCA^ 3JM 

aCTTA.-:TCTTAACGCCTCCATCGTC-A.A-JATC^ 36 00 

rATTATCAACAAGCCTGTTQA^^AAAAAw-AAtAC . n _ 70 



•J620 



-,fvv 3550 36o0 3O'0 

SArCCACCTCCAGCTG-lcW 

.TAAfiAAfiTTSACAAA-GACCCCGTGA.A.! G ^*^^T?? Tr .TTCf ACTTAAAATGACTTCAATCCi^ M}** 
-^cScS^GGACGTTCCACCGCTTCC^C.^ 

: AACCACCAACG-ACGATGTTC T ^ ' j** (j-ff GflCTCAGGTGACTCCGCCGAGAAAAAl. 3A«> 

, 5 CAGTCG-CCGCGTCTGAA6ACrCCA GGCT,-T.CX .« )in »*» 

3860 H870 36 g_ , , , I ■ ■ • ■' i '"' a ^ vv 

^^^^^ g 

SCTCGACGACA-ATCCACGGA^ 

4710 W2C "'V J , | , i .... i . I i i i i l i i i »-L 

„L^1^ I'm L ' ; f ^TACAAACTTCTgTCCCAG rGCCGAACGAGCCA t2Jj 

-/GTCGATTCTCGATCTCGAGCAGAAC.A( : l ; -( ? AA ^i'^^AGATCCCCGGGATACTCATCCrA : T- T • 3'JO 
A^frGGCGCCGCTGCCACCTWACCTT^^ «j£ 
r." ACAC f rArCAGTCT.AGCTfiA. AA ^AV?'»AAI£l j r, A -r GT AAATG ccaccttcaagau r ruo* K 
r;t # ; A AAAACC *AGC --(-q^qJcGGG'GCCGAAC rCGATGTCG AAA t ATGA 1 k^*-'-' 

-^ AC ^ AT ^ U " T ^> .... 



WO 98/24810 



175 



PCT/EP97/06956 



Monday, 1 December 1 997 14 12 

*>9 PCB501 Page J 

asio *m 0 q030 „ ga0 

1 ■ » * I i i i 1 1 i t . . i . . . , I . t . . t t | . t t iii 1 i i 1 I i i i i i I '1970 

....^.V. ■ ai , a °..., , «P0 , 5650 6680 68,0 " 

. , , , . b9 l SQ , 5970 5flcT0 5990 6000 6010 " 60?0 " 

- .'./.I^WIblAAAAAi GACACCCAACCATGGCT fSCACi IGAGC !T C AG G A T G T TG A T C FTC ~ f C A 6"3C; 
....... ■ 63 .'° . . , . . . 63 . 2 P | , : . ... S ^ Q 6350 63QC- l : .TA-; 

^jiXAAGGAfGGGATAAAGCj TCC.'A i GGACAGAAAGCFSCTToGGAGGACCCAG foGAATGGGTCCGGGA.' 666? 

6660 6670 663-0 66S0 6700 6710 67->0 
' " ' ' ' ' 1 1 1 1 1 ' 1 ' 1 ■ 1 ■ 1 ■ ■ ■ • ' ■ ■ ■ ■ 1 ■ ■ i 1 i , ■ ■ i » ' 1 r ' ' ■ , | 

rr *rff2S^r A f-^n^^ C : AA H A ' T f rA *5GGTTCCCCAATCACT<i rCACCCCCGGACACCACA-V 

(A I OClATCmCiC. TA ■ v.T fA(,(. . ::r rcr-.- : CCCCTCTCCTCTTTCAGAGCAC I GGC ! C • CCtAtCCCOVV; 7000 

, , , , , 7 0 , 10 . '-'^O 7020 .'040 7050 ,XM> ->C70 

' , ' ' 1 " 1 1 1 1 1 1 1 ■ 1 1 ' ■ ' ■ 1 ■ ■»'■••■ 8 i , ■ . ■ i . . ■ . i , i , ' , | 

^u^r!^ 707C 



;: [ ;. ^ - AAfG ? ^uoGTGGC;, T ! i GGGAACTTGTGCCCCC UAAC AGATHA.' TfiGCC T'.C TC T AA I 7 ' V) 
-^ac^c-y^l^^ 1 r - T ---CCTTuAC . ,C r rC-TTCAArTA^AACTCC fcCGC " f I i 
: I:^^^^; ^ A ^ AAAA ; A .'- - AAA > * r - ^CAGC. AGTTCCCCGGAATTCAGC- T3GAC r ! AACT«3S V23C 
~™"--TmC GAAAAtiAAGCCGAATTrCAG'.'ACACrGGCCTCCCCATGGTAT fC* ! AIC GAGCrcCSC 7350 
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Fan* •/ 



7360 
_L 



7370 
. 1 . 



7300 



7390 



7400 

1 1 1 1 



/mo 



7H20 



! ... 1 ■ ■■■ I 1 1 ' '^I.-.lr-rrrr ArTt^^ 712.0 

A.^CC^CrGTCATCAGATC^ ,, Qr) 

tirs^ as 

, | , , Z/' 30 - ■ • ■ ■ . I ■ . ■ ■ I m , . I ■ ■ ■ ,.L,g J -«l....-..U-.^-L 




rtoeo www - t , , , , l , . i . , . , i , i , , 1 1 , i ■ i 

*tgg6GtttttgccWaaatgac^^ 5!S 

:CGrACGGGCCCTTTC^ 



I*) 



.^G^CGGGCCCTTTCGT^ 3200 

asAcw; rcACAGC rrc rc ro uagcMj a^;^*.^ Sc a nc r ac tgacag facie c a pa recce. 8330 

... i ■ ■ • ■ ■ ■ 1 1 ■ ■ ■ ■ 1 " 1 ' ' ' 1 1 ' 1 1 1 1 ' 1 



■ , , I i ■ ■ ■ l' ■ . ■ ■ I ' " ■ ' ' ■ " 1 ' 1 ' ! :i;llTTr--AfUrG TCAGGTC-GCAC IT rTCSGGG*AArG 3'i70 
Tf,rGCOGAACCCCTATTTC,TTTArTT.»CTA^T^ Sew 



^ATTGAAAAASGAASASrATGAfiTA.^ 

tAuTuG-~ '-^ - Q ^| 0 rtHvti 



r.\nA"CAGTTXiGTGCACGAGT 
0770 
I I I 1 I I I 1 » * 



13/60 



3780 



3790 

J,i .lit 



98C0 

_L 



(3820 



,:uiACGCCGGGCAAC,A<:nAArTCGGT^CCG«.^ 8960 

TflATAACACTGCGGCCAALTTACT ^^J^SgaGCTGAATGAAGCCATACCA4ACC.\CG 9,00 
^A"ATGGGGCATCATGTAACrCCjC-l .vn.-v.si Ivjvjviw v.s.o ^ ^ 



i i i i I ' • ^ ■ t .... I ■■■■ » ■ ■ ■ ^ " 1 ' " ' ' '''^''r^j^'f \rjr A rr.A A.' r A." f TAC 9j70 
AGUG TG ACACC ACGA f GCC ^ A*f * 3G^ f AAAGI I GC AGGACCAC'TCTGCGC TCG 92 ,0 
r . AGCTTCCCG'oCAACAATTAAl A?:*.. ' /j 1 '^^^.! -TGArv~Crr,GCT CTCGCGG TA I'lA I I G 031'J 

I^CACTaaGKCAGATnQrAAGCC. --^^laGMTAAGCA-TGGTAACTLirCAGACCAA 
aTGAACGAAATAGAC AGA j- -jA.i-" AuCTa ... {, 0 1 0 OMO 

»,«) WTO . ,.l 



9120 



9110 
i 1 1 L 



C1C0 



S1G0 
I . i - 1 - 1 



SI /o 
■I .■111. 



^ 1 . I ;: T :I :.t TC a AAiTjAAAA^AT CrAGG-GAACA CC 

C. I" r r AC rCATATATACT-TAGA T AAA-.- ; IXA I M ll aa q rcA3AC CCCG i AOA 

' , m J A;-AAr:TCAT^CCaA^rcrCTTAAC..G^ 

AAAGftfCAAAGSATCTTCT TGAGA I C». :Tr , ' I^^^iJ^f TTTTT CCGAAGG I'AAC I GGC " I '-A J. ; A- 
CCGC TACCACC3G FSG f TGTTTCC'CCA < .AAWGCTA.CAA TuT M f rCAAGAAC C r<: r «*0 

r.: • i(j AviCGC AG.A "ACCAAATACTG fC. ' XTAG GT.o., . fAa . A..u 
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00 to 9320 9830 96W 



■■ ■ 1 1 1 1 . 1. 



M70 

JL 



1 1 ALCGCiG iTGGACTCAAGACGATAG r lACCGGATAAGGCGr AfirfiRTrfirrrr.- a a . fa de '^ 
ACACAGCCCAGC r rfiGAGCCAACCACCTACACCGAAT TSAGATA^Sr Sr^-^S 0 * 10150 f fC(: ' 99tC 

cgccacgcttcccgaagggagaaaggEggaSawtakcg^ ico so 

ACGAGGGAGCTTCCAGGGGGAAACGCCTCGTATCTTTA rAGICCTGTCGGGT^ 

^^.■■■? Q .' 7Q ,0 . ,8 ° ■ T° 10 ? Q ° . 1Q ?" ' 10220 "* 

■■ 10 ? 10 - ■ 10 f 20 ,0 * 30 . ™«o , ny o 10560 ' icLo 
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^.tcggg ™^^ - 

CT"CCTAGCCCTCTAGAGGGCTAGGGGATACCAGCTGAGAGTCATGTTAGACGAGACTACGGCGTATCAATTCGGTCATAGACCAuGGAC5AACACACAA 
T Q R E ! S R S P M V 0 S Q Y N L L ■ C R 1 V K P V S A P C L C V 

GGAGGTCGCTGAGTAGT GCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAATTGC ATGAAGAATC7GCTTAGG5TTAGGCGTTTTGC5 ^ 
CCTCCAGCGACTCATCACGCGCTCGTTTTAA ATTCGATGTTGTTCCGTTCCGAACTGGCTGTTAACGTACTTCTTAGACGAATCCCAATCCGCAAAACG- 



G R 



V V ft E 0 N L S Y N K A R L 0 R 0 L H E E 3 A G . A F C 



CTGCTTCGCGATGT ACGGGCCAGATATACGCGT7GACATTGATTATTGACTAGTTATTAATAGTAATCA ATTACGGGGTCATTAGTTCATAGCCCATATA ^ 
GACGAAGCGCTACATGCCCGGTCTATATGCGCAACTGTAACTAATAACTGATCAATAATTATCATTAGTTAATGCCCCAGTAATCAAGTATCGGGTA r «" 
AASRCTGQIY ALTLI IO.LtlV 1NY GV1SS .P 
TGGAGTTCCGCGTTAC ATAACTTACGGTAAATGuCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGAC GTCAATAATGACGTATGTTCCCATAGT ^ 
jvCCTCAAGGCGC AATGTATTGAATGCCATTTACCGGGCGGACCGACTGGCGGGTTGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCA 
GVP RY I TYGKVPAVL T AQRPPP I 0 VNN0VCSH3 

aacgccaatagggactt tccattgacgtcaatgggtggactatttacggtaaactgc ccacttggcagtacatcaagtgtatcatatgccaagtacgcc- 5; . 
ttgcggttaIccctgaaaggtaactgcagItacccacctgataaatgccatttgacgggtgaaccgtcatgtagttcacatagtatacggttcatgcgg^ 

, A N R 0 F P L T S H G G L F T V N C P L G S T S S V S Y A K Y A 
CCTATTGACGTCAA TGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCC TACTTGGCAGTACATCTACGTATTAGTCA ... 

ggataactgcagttactgccatttaccgggcggaccstaatacgggtcaIgtactggaataccctgaaaggatgaaccgtcatgtagatgcataatcag- 

F Y . R q . a , M A R U A L C P V H D L H G L 3 Y L A V H L R I S H 

TC'.rTATTACCATGGT GAT 5C GGTTTTGGCAGTACaTCAATGGGCGTGGATAGCGGTTTGACTCACGGG GATTTCCAAGTCTCCACCCCATTGACGTCAA ^ 
AG'GATAATGGTACCACTACGCCAAAACCGTCATCTAGTTAC CCGCACCTATCGCCAAACTGAGTGCCCCTAAAGGTTCAGAGGTGGGGTAACTGC AGT" 
p y y H G D A V L A V H 0 V A V . A V . L T G . S K S P P H ■ R 0 

TGGGA GTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGA CGCAAATGGGCGGTAGGCGTGTACGGTGGCifc'.: ^ 
Ai'''"T^AAA'"AAAACCGT6GTTTTA3TTGCCCT3AAA5uWtIACAGCATTGTTGAGGCGGGGTAACTGC3TTTACCCGCCATCCGC 

>';FVIAPK5TGLSKMS . 0 L 3 P IQANGR ACTV'j 
S7C TATATAAGCAGAGC T C TCTGGCTAACTAGAIAACCCACTGCTTACTGGCTTATCGAAATTAATAC GACTCACTATAoGGAGACCCAAGC TGGCTAG- 
i"^GATATA7TCG *C TC3AGAGACC3A T TGATCT3"53GTGACGAATGACCGAArAGCTT 'A ATTATGC "SAOTGATATCCC TCTGGGTTCSACCGA'CG 

I _> 
I T7 promote* orimtnq site 1 

0 , y K 0 S S ■. A » ■ 3 T H C L L A Y R H . Y 0 3 L . G D P S V L A 
GT7TAAACTTAACC TTACCA TGGG3SGTTCTCA r CATCATCATCATCATGGTATGGCTA3CATGACTGG "33ACAGCAAATGGGTCGGflATCTGTAC'j.'i'.l . 

T^T^toaTtTcma^^ 



t.pr oBond bindinq domain 

F < L K L T H G 3 S H f H H H H G H A S H T 3 G Q 0 M G R D L Y D 
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GA TGACGATAACGTACCTAGG ATCCATATGCCTCCTTGCCQTCGACGTGTC^ ^ _ 

C taCTGCTaTTCCATGGATCCT AGGTATACGGAGGAACGGCAGCTCCACASTTATTGTATAG TCAGAGGGAG7TTCCAGACTTCCTCTT7ACGCAGC7G" 



-U4 CRF 



^ODKVPRIHMPP C R R G V N N 1 S V S L K G L K E K C V 0 

GCCTGGTGTTCGAGACG CTGATCCCCAAGCCGATGATGCAGCACTACATAAGCCTCCTGC TGAAGCACCGGCGCCTCGTCCTCTCGGGCCCCAGCGGCA'J 
CGGACCACAAGCTCTGCGACTAGGGGTTCGGCTACTACGTCGTGATGTATTCGGAGGACGACTTCGTGGCCGCGGAGCAGGAGAGCCCGGGGTCGCCGTS 



■ pCB201 insert a U4 



-U4 0RF 



PMMQHY ISLLLKHRRLVLSGPSGT 



GGGCAAGACCTACCTGACC AATCGCTTGGCCGAGTACCTGGTGGAGCGC^ 

CCCGTTCTGGATGGACTGGTTAGCGAACCGGCTCATGGACCACCTCGCGAGACCGGCACTCCAGTGTCTCCCGTAGCAGTCGTGGAAGTTGTACGTGGT: 



-U40RF 



KTYLTNRLAEY 



LVERSGREVTEG JVSTFMMH 



C AoTCTTGCAAGGATCTGCAACTGTATCTTTCCAACCTAGC C AACCAGATA5ACCGGGAAAC AGGAATTGGGGATGTGCCCCTGGTGATTCTaTTGGAT2 
GTCAGAACGTTCCTAGACGTTGACATAGAAAGGTTG3ATCGGTTGGTCTATCTGGCCCTTTGTCCTTAACCCCTACACGGGGACCACTAA3ATAACC7A: 



- pCB201 insert = U4 



■U4 CRF 



; SCKOLQL Y L 5 N L A 



NQ 10RETG IGOVPLV i L L 0 



T jAGTGAAGCAGGCTCCATC AGTGAGTTGGTCAA7GG GGCCCTCACCT3CAAGTATCAT AAATGTCCC TATATTATaGGTACCACC AATCAGCC* 1 !*)" 
•o^AC rCACTTCGTCCGAGGTAGTCACTCAACCAG7TACCCCGGGAGTGGACGTTCATAGTATTTACAGGGATATAATATCCATGGTGfl r TAGTCG3AC- 



• pCB201 insert = U4 



-U4 ORF — ™ — ' 

SEAGSI S E L V \ G A L TCKYHKCPY I i G T T * 0 P 



^■XAAAfGACACCCAACC ATGGCTTGC AC TTGAGC T7C AG3ATGTTGAC CT TCTCCAACAACGTGGAGCCA3CC AATGGCTTCCTGGTTCGTTACCT3AGU 
:r*7TACTGTGGGTTGGTAZCGAACGTGAAC7C3AA3TCCT ACAACTGGAA3AGGTTG 7TGCACCT CGGTCGGTTACCGAAGGACCAAGCAATGGA:7C: 

- pCB201 insert ~ U4 



7 ? N H G L 



U4CRF — 

w (_ 5 F R M L 7 F S N M V E => A M 0 



r L 



Y L F 
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« SC ^ C *" T CCCT S8 CCATCA3CCCAAC^ 

0L1.CTGTGTGAAGGGACCGGTAGTCGGGTTGTTCTG GTTAGTTTCGACATGGTGGACGGGGG TG GGTCGCACCCGGGAGTGTCGTAACnrtAnTO.^£^r..~." " 




: !■■«■ 



— " ■ V * ° 5 7 P S 5 L ° S ° » l " ^ " L (-KLQEAAMV 
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'3 F G N H 
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TCTTAGCTCCTCCTCTCCCCTCTCCTCTTTCAGAGCACTCGCTCTCCAGCCCCAGGAGGAGAACAGGAGGGAGGAGGAGATGAAAGAGGAGGGACAGGT- 

11 1 * 11 i i i i j i i i i i i i ■ | | ■>■»..•,- 

agaatcgaggaggagaggggagaggagaaagtctcgtgaccgagaggtcggggtcctcctcttgtcctccc tcctcctctactttctcc tccctgtccaa " w " 



-pCB201 insert a U4 



lllsplifostgspapggeqegggderggtg 



cttggtgctgtacctttgagaacttcctaggaaggaatggtggggtggcgtttgggaacttgtgccccctaaacacatttactggcctcctctaatgact 

1 i i i ■ ■ i i ! ■ i i i i t i i i i i i i ~ i «;•%- 

gaaccacgacatggaaactcttgaaggatccttccttaccaccccaccgcaaacccttgaacacgggggatttgtgtaaatgaccggaggagattactga " 



-pCB201 insert = U4 



svcctfenflgrnggvafgnlcplntftgll. .l 
ttggggaaaagatgattctgggtctttcccttgacttcttgtttcaatta^ 

aaccccttttctactaagacccagaaagggaactgaagaacaaagttaatgtttgaggacccgaaagacccctccccaagtcttttgtagttttgtgacg 



-pCB201 insert = U4 



vgkdosgsfp llvs 1 tnsvafvggvgk t s k hc 

agcagttcctaaatgattctcacaagcaaccctgagagagacagtcttgtgagggagatctgggggaggcaggaagctcctcagattttctcacagaccc 
tcg tcaaggatttactaagagtgttcgttgggactctctctgtcagaacac tccctctagaccccctccgtccttcgaggagtctaaaagag tgtc tggs 



- pCB201 insert = U4 



SSS M I L TSNPERD5L V R E I VGRQEAPQ IFSQT 

T TC CC AA TTCCATC AC C AC TGCCAAC AC TCGTCCGGAATTCTGCAGATaTCCAGCACAGTGGCGGCCGCTCGAGTCTAGAGGGCCCGTTTAAACCCGCTG 
AAGGGTTAAGGTAGTGGTGACGGTTGTGAGCAGGCCTTAAGACGTCTATAGoTCGTGTCACCGCCGGCGAGCTCAGATCTCCCGGGCAAATTTGGGCGa: 



-pCB201 insert =U4 



LPN5 I TTANTR^EFCR YPAQURPLCSRGPV. TR 
ATC AGCC TCGACTGTGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCC tggaaggtgcc actcccactgtcctttcc 

1 i . 1 i i ■ ■ ' ' ■ ■ ' ■ 1 1- ic-j-: 

tagtcggagctgacacggaagatcaacggtcggtagacaacaaacggggagggggcacggaaggaactgggaccttccacggtgagggtgacaggaaagg 

sa5tvpssc0psvvcpspvpsltlegatptvl3 

taataaaa rgaggaaartgcatcgcattgtctgagtaggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattgggaagaca 
attattttactccttt aacgtagcgtaacagactc atccac agtaagataa3accccccaccccaccccgtcc rgtcgttccccc tcctaacccttctg" 
nee iashcls3chs ilgggvggdskgedveo 

atagcaggc a tgctgggga tgcggtggg c tct a tggcttctg aggc gg aaagaac c agctggggctc tagggggt atc cccacgcgccc t3 tagcggcgc 
» — — 1 • 1 i 1 1 » ■ i > « ' i ii 3ce>: 

TATCGTCCGTACGACCCCT ACGCCACCCGAGATACC3 AAGACTCCGCCTTTCTTGGTCGACCCCGAGATCCCCCATAGGGGTGCGCGGGAC ATCGCCGCCj 
NSRHAGDAVGSMASEAERTSVGSRGYPHAPCSGA 

ATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGC- 
■ 1 . —h — i 1 ■ i « • ■ i 1 1- 

TAATTCGCGCCGCCCACACCACCAATGCGCGTCGCACTGGCGATGTGAACGGTCGCGGGATCGCGGGCGAGGAAAGCGAAAGAAGGGAAGGAAAGAGCGj 
LSAAGVVV7R3VTATLASALAPAPFAFFPSFLA 

ACGTTCGCCGGC T TTCCCCGTCAAGC TCTAAATCGGGGCATCCCTTTAGG3TTCCGATTTAG TGCTTTACGGCACC TCGACCCCAAAAAAC TTGATTAG j 

11 1 1 i t — !!■■■■>■ i i i — »- ■ ■ i i i ■ i ii - .. — + v* j»; 

tgc aagcggc cg a aaggggcag t tcgag att ta3ccc c 3 taggg aaatcccaaggct aaa tc ac gaaatgccg tggagctggggt ttttt5aac taatcc 

tfagfprqalnrg iplgfrfsalrhldpkkld 
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GTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTCCACGTTC TTTAATAGTGGACTCTTGTTCCAAACTG3 

— i i i — ■ iii — » * 1 1 * ' — i.i • ■ r jj. ; . 

CAC TACC AAGTGCATCACCCGGTAGCGGGACTATCTGCCAAAAAGCGGGAAACTGCAACCTCAGGTGCAAGAAATTATCACCTGAGAACAAGGTTTGACw 

OGSRSGPSP . . tvfrpltlestffnsgllfqtg 



AACAACACTCAACCCTATCrCGGTCTATTCTTTTGATTTATAAGGGArTTrGGGGATTTCGGCCTATTGGTTAAA AAATGAGCTGATTTAACAAAAATTT 

TTuTTGTGAGTTGGGATAGAGCCAGATAAGAAAACTAAATATTCCCTAAAACCCCTAAAGCCGGATAACCAATTTTTTACTCGACTAAATTGTTTTTAAA ^ 
T T L N P [ S V Y S F 0L.GILGISAYVLKNEL1 .QKF 

aacgcgaattaattctgtggaatgtgtgtcagttagggtgtggaaagtccccaggc tcccgaggcaggcagaagtatgcaaagcatgcatctcaattag" 

TTGCGCTTAATTAAGACACCTTACACAC AGTCAATCCCACACCTTTCAGGGGTCCGAGGGGTCCGTCCGTC TTCATACGTTTCGTACGTAGAGTTAATCA ^ 
* A ** • P C G M C V S . G VESPQAPQAGRSHQSMHLN 

cagcaaccaggtgtggaaagtcc ccaggctccccagcaggcagaag tatgcaaagcatgcatctcaattaq tcagc.aaccatagtcccgcccctaactcc 

' 1 1 1 !■■ It II I I i i i I J|5.---»- 

gtcgttggtccacacctttcaggggtccgaggggtcgtccgtctt€atacgtttcgtacgtagagttaatcagtcgttggtatcagggcggggattgag3 
satrcgkspgspagrsmqsmhln. sat ivppltp 

gcccatcccgcccctaactccgcccagttccgcccartc t ccgccccatggctgactaattttttttatttatgcagaggccgaggccgcctctgcctct 
cgggtagggcggggattgaggcgggtcaaggcgggtaagaggcggggtaccgactgattaaaaaaaataaatacgtctccggctccggcggagacggaga 

P I P P L T P P S S A H S P PHG.LI FFIYAEAEAASA3 
GAGCTATTCCAGAAGTA GTGAGGAGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAAAGCTCCCGGGAGCTrGTATATCCATTTTCGGATCTGATCAAGAGA 

' : ' 1 1 ' 1 ' ■ III I i I i | I . i ■ | | | , ; | ! ^£ fV 

C'CGATAAGGTCTTCATCACTCCTCCGAAAAAACCTCCGGATCCGAAAACGTTTTTCGAGGGCCCTCGAACATATAGGTAAAAGCCTAGACTAGTTCTCT 
E L F Q k . GGFFGGLGFCKKLPGAC IS I FG3DQE 

CAGGArGAGGATCGTTTCGCATGATTGAACAAGATGGATTGC ACGCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACA 
G~3CTAC TCC TAGCAAAGCGTACTAACTTGTTCTACCTAACGTGCGTCCAAGAGGCCGGCGAACCCACCTC TCCGATAAGCCGATACTGACCC6TGT75" 
T 3 • G S F R M I E Q5GLH AGSPAAVVERLFG Y 0 W A 0 0 

GACAATC jGCTGCTCTGATGCCGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGA CCGACCTGTCCGGTGCCCTGAATGAACTj 

c - 3ttag:cgacgagactacggcggcacaaggccgacagtcgcgtccccgcgggccaagaaaaacagttctggctggacaggccacgggacttacttga: 
7 t s c s o a avfrlsaqgrpvlfvk tolsgal mel 

cwggacgaggcagcgcggcratcgtggctggccacgacgggcgtt ccttgcgcagctgtgctcgacgttgtcactgaagcgggaagggactggctgcta- 

G"CTGCTCCGTCGCGCCGATAGCACCGACCGG"GCTGCCCGCAAGGAACGCGTCGACaCGAGCTGCAACAGTGACTTCGCCCTTCCCTGACCGACGATh " 
C - A A R USVLATTGVPCAAVLOVVTEAGRD'JLL 

"'i3GCGA-GTGCC3335CAGGArCTCCTGTCATC 7CACC TTGCTCCTGCCGAGAAAG TATCCATCATG GCTGATGCAATGCGGCGGCTGCATACGC T7GA 
ACCCGCTrCACGGCCCCGTCCTAGAGGACAGTAcAGTGGAACGAGGACGGCTCTTTCATAGGTAGTACCGACTACGTrACGCCGCCGACGTATGCGAAC- 
L3EVP 3Q QLL SSHL A PAEKVS IMAOAMRRLHTLO 

TCCGGC T ACCTGCCCATTCGACCACCAAGCGAAAC ATCGCATCG AGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATC AGGATGATCTGGACG^A 
AG3CCGATGGACG33TAAGCTGGTGGTTCGCTTTGTAGCGTAGCTCGCTCGTGCATGAGCCTACCTTCGGCCAGAACAGCTAGTCCTACTAGACCTGCTT " J "'* 
P A T C P F D HQAKHR IERARTRME4GLV 0003LDE 

GA3CATC~GGGGCTC3CGCCAGCCGAACTGTTCGCC AGGCT^ fGC TTGL 

C "G TAG TCCCCGA3C3CGGTCGGCTTGACAAGCGGTCCGAGTTCCGCGC3 TACGGGCTGCCGCTCCTAGAGCAGCAC TGGG rACCGCTACGGACGA.i»^'J 
_2 " CGLAP AEL FARLKARMPOGEO-LVVTHGOACL 
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CGAATATCATGG tggaaaatggccgcttttctggattcatcgac tgtggccggctgggtgtggcggaccgctatcagg acatagcgttggctacccgtga 

GCTTATAGTACCACCTTTTACCGGCGAAAAGACCTAAGTAGCTGA^ 

P N 1 M V £ N G R F S G F 1 DC G RLG VADR YQQ I A L A T R D 

TATTGCTGAAGAGCTTGGCGGCGAATGGGC TGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCA TCGCCTTCTATCGCCTTrTT 
ATAACGACTTCTCGAACCGCCGCTTACCCGACTGG 

1 A E E L G G E A 0 R F L V L Y G t A A P 0 S Q R I A F Y R L J 

GACGAGTTCTTCTGAGCGGGACTCTGGGGTTCGAAATGACCGACCAAGCGACGCCCAACCTGCCATCACGAGATT TCGATTCCACCGCCGCCTTCTATGA 

ctgctcaagaagactcgccctgagaccccaagctttactggctggttcgctgcgggttggacggIagtgctctaaagcta a70c 

0 £ F . F V ft G L V G S K . P T K R R P T C H H E IS I P P p P S M 
AAGGTTGGGCTTCGGAATCGTTTTCCGGGACGC^ 

ttccaacccgaagccttagcaaaaggccctgcggccgacctactaggaggtcgcgcccctagagtacgacctcaagaagcgggtg^ *** 

KG VAS£S FS 6 TPAG,SS SAG ISCVSS S P T P T C L L 
GCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAA^TTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTC 

cgtcgaatattaccaatgtttatttcgttatcgtagtgtItaaagtgttIatttcgtaaaaaaagtgacgtaagatcaacaccaaacagg^ *** 

0 > J . M V T N K A I A S Q I S 0 I K H F F H C I L V V V C P N S S 

atgtatcttatcatgtctgtataccgtcgacctctagctagagcttggcgtaatcatggtcatagctgtttcctgtgtga^^ 

tacatagaatagtacagacatatggcagctggagatcgatctc^ *>o: 

" Y I " S V Y R R p L A R A VRMHGHSCFLCElvrRSQ 



TCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGrGCCTAATGAGTGAGCTAACTCACATTAArTG CGTTG 

aggtgtgttgtatgctcggccttcgtatttcacatttcggaccccacggattactcactcgattgagtgtaattaacgcaac 5082 

F H T T Y E P E A - .SV<PGVPNE.ANSH.LRV 
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1. A vertebrate protein homologue of an UNC-53 
protein of C. elegans or a functional equivalent, 
derivative or bioprecursor thereof, which protein 
comprises an amino acid sequence having a 
statistically significant homology to the amino acid 
sequence of said UNC-53 protein of C. elegans 
illustrated in Figure 2. 

2. A vertebrate protein homologue of an UNC-53 
protein of C _ elegans , which protein comprises an 
amino acid sequence having one or more of sequence 
blocks A, B, C, D or E as illustrated in Figure 9a, or 
block F in Figure 12a or a sequence having a 
statistically significant homology therewith. 

3. A vertebrate protein homologue of an UNC-53 
protein of C. elegans , which protein comprises an 
amino acid sequence having one or more of sequence 
blocks A, B, C, D,E or F which differ from those 
blocks of Figure 9a or 12a only in conservative amino 
acid changes. 

4. A vertebrate protein having an amino acid 
sequence encoded by the nucleotide sequence shown from 
nucleotide positions 1 to 6013 illustrated in Sequence 
ID No. 3. 

5. A vertebrate protein comprising an amino acid 
sequence which comprises one or more of the prosite 
signatures as illustrated in Figure 28 for each of 
said sequences of homology as claimed in claim 2. 

6. A vertebrate protein comprising an amino acid 
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sequence as claimed in any one of claims 1 to 6 which 
is a human protein or a mouse protein. 

7. A vertebrate protein having an amino acid 
sequence encoded by the nucleotide sequence shown in 
Sequence ID No. 4. 

8. A vertebrate protein homologue according to 
any one of claims 1 to 7 comprising an amino acid 
sequence as shown in Sequence ID No. 1 or an amino 
acid sequence which differs from the amino acid 
sequence shown in Sequence ID No. 1 in one or more 
conservative amino acid changes. 

9. A vertebrate protein homologue according to 
any one of claims 1 to 7 comprising an amino acid 
sequence as shown in Sequence ID No. 2 or an amino 
acid sequence which differs from the amino acid 
sequence shown in Sequence ID No. 2 in one or more 
conservative amino acid changes. 

10. A cDNA encoding a vertebrate homologue of 
UNC-53 protein of C. eleaans according to any of 
claims 1 to 9 . 

11. A cDNA according to claim 10 comprising a 
sequence of nucleotides encoding an amino acid 
sequence as shown in Sequence ID No. 1 or an amino 
acid sequence which differs from the amino acid 
sequence shown in Sequence ID No. 1 only in one or 
more conservative amino acid changes. 

12. A cDNA according to claim 10 comprising a 
sequence of nucleotides encoding an amino acid 
sequence as shown in Sequence ID No. 2 or an amino 
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acid sequence which differs from the amino acid 
sequence shown in Sequence ID No. 2 only in one or 
more conservative amino acid changes. 

13. A cDNA according to any of claims 10 or 11 
which cDNA comprises at least from nucleotide position 
1 to position 6013 of the sequence as shown in 
Sequence ID No. 3. 

14. A cDNA according to claim 10 or 12 which 
comprises the nucleotide sequence illustrated in 



15. A nucleic acid molecule capable of 
hybridising to the DNA sequences according to any of 
claims 10 to 14 under high stringency conditions. 

16. A DNA expression vector which comprises a 
cDNA as claimed any of claims 10 to 14. 

17. A vector according to claim 16 which 
comprises a promoter of C. elegans UNC-53 protein or a 
vertebrate homologue thereof according to any of 
claims 1 to 9. 

18. A vector according to claim 17 wherein said 
promoter sequence is derived from a gene encoding a 
mouse or human homologue of an UNC-53 protein of 
elegans. 

19. A vector according to any of claims 16 to 18 
which further comprises a sequence encoding a reporter 
molecule. 

20. A vector according to claim 19 wherein said 
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reporter molecule is a fluorophore. 

21. A host cell transformed or transfected with 
the vector of any of claims 16 to 20. 

22. A host cell transformed or transfected with 
the vector of claims 19 or 20. 

23. a host cell according to claims 21 or 22, 
which cell comprises a prokaryotic cell such as a 
bacterial cell or a eukaryotic cell such as a fungal, 
an animal, a plant or an insect cell. 

24. a transgenic cell, tissue or organism 
comprising a transgene capable of expressing a protein 
according to any of claims 1 to 9. 

25. A transgenic cell, tissue or organism 
according to claim 24 which comprises any of a COS 
cell, Hep G2, MCF-7 cell, N4 mouse neuroblastoma cell 
a NIH3T3 cell, or colorectal carcinoma or human 
derived cells. 



Lsm 



26. A transgenic cell, tissue or organic, 
according to claim 24 or 25 wherein said transgene 
comprises a vector according to any of claims 16 to 



20. 



27. A transgenic cell, tissue or organism 
according to claim 24 to 26 wherein said transgene 
comprises a vector according to claim 19 or 20. 

28. a transgenic cell, tissue or organism 
according to any of claims 24 to 26 wherein said 
organism comprises any of an insect, a fungus, a non- 



PCT/EP97/06956 
WO 98/24810 

- 188 - 



10 



15 



20 



25 



30 



35 



human mammal, a plant or a nematode worm. 

29. A method of producing a mutant vertebrate 
non-human organism which mutation affects cell 
behaviour or the regulation of cell motility or the 
shape or the direction of cell migration, which method 
comprises inducing a mutation in the wild type gene 
encoding the vertebrate homologue of an UNC-53 
sleaans protein. 

30. A vertebrate protein homologue of an UNC-53 
protein of fl. eleaans , or a functional equivalent, 
derivative, fragment or bioprecursor thereof, for use 
as a medicament to promote neuronal regeneration, 
revascularisation, wound healing or for treatment of 
chronic neuro-degenerative diseases or acute traumatic 
injuries or fibrotic disease. 

31. A vertebrate protein homologue of an UNC-53 
protein of " ~i~™ns for use as claimed in claim 30 
wherein said vertebrate human homologue is as claimed 
in any one of claims 1 to 9 . 

32. Use of a vertebrate protein homologue of an 
UNC-53 protein of r, «Megans, or a functional 
equivalent, derivative, fragment or bioprecursor 
thereof, in the manufacture of a medicament for 
promoting neuronal regeneration, revascularisation, 
wound healing or for treatment of chronic 
neurodegenerative diseases or acute traumatic injuries 
or fibrotic disease. 

33. Use of a vertebrate protein homologue of 
UNC-53 protein of r. Regans according to claim 32 
wherein said vertebrate protein homologue is as 
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claimed in any one of claims 1 to 9. 

34. A pharmaceutical composition comprising a 
vertebrate homologue of an UNC-53 protein of 
slesans, or a functional equivalent, derivative 
fragment or bioprecursor of said vertebrate protein 
together with a pharmaceutical l y acceptable carrier' 
diluent or excipient therefor. 

35. A pharmaceutical composition as claimed in 
claim 34 which comprises a vertebrate homologue of an 
UNC-53 protein of C, e 1f ^ n , according to any of 
claims 1 to 9. 

36. a nucleic acid sequence encoding a 
vertebrate homologue of an UNC-53 protein of 
^isoans or a functional equivalent, fragment, 
derivative or bioprecursor of said vertebrate 
homologue, for use as a medicament. 



37 



36 wh • * nU01ei ° aCid Sequence according to claim 

in rr r id sequence is a cdna sequ — - 

sai™ 10 t0 14 ° r 3 fu "«io„al fragment of 

said cDNA sequence. 

38. Use of a nucleic acid sequence encoding a 
vertebrate homologue of an UNC-53 protein of ^ 
Slsaans or a functional equivalent, fragment, 
derivative or bioprecursor of said vertebrate 
homologue, in the manufacture of a medicament to 
promote neuronal regeneration, revascularisation or 
wound healing, or for treatment of chronic 
neurodegenerative diseases or acute traumatic injuries 
or tibrotic disease. 
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39 use of a nucleic acid sequence according to 
claim 38 wherein said sequence is a cDNA sequence as 
claimed in any of claims 10 to 14 or a functxonal 
fragment of said nucleic acid sequence. 

5 40. A pharmaceutical composition comprising a 

nucleic acid sequence according to claim 36 or 37 and 
a pharmaceutical^ acceptable carrier, diluent or 
excipient therefor. 

10 41 A pharmaceutical composition according to 

^i. 40 wherein said nucleic acid sequence is a cDNA 
sequence as claimed in any of claims 10 to 14. 

15 42. A method of determining whether a compound 

is an inhibitor or enhancer of the regulation of cell 
behaviour, growth, cell shape or motility or the 
direction of cell migration, which method comprises 
contacting said compound with a host cell 

20 claim 21 or 23 or a transgenic cell as claimed in any 
of claims 24 to 27 and screening for a phenotypic 
change in said cell. 

43. A method according to claim 41 which is 
25 capable of determining whether said compound is an 
inhibitor or an enhancer of the signal transduction 
pathway of said transgenic cell of which said 
vertebrate homologue of an UNC-53 protein or a 
functional equivalent, derivative, fragment or 
30 bioprecursor of said vertebrate homologue is a 

component or is an inhibitor or an enhancer of a 
parallel or redundant signal transduction pathway m 
said cell. 

A method according to claim 43 wherein said 
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method is capable of determining whether said expound 
is an inhibitor or an enhancer of said vertebrate 
homologue of an UNC-53 protein of C. tlftgim or a 
functional equivalent, fragment, derivative or 
5 bioprecursor of said vertebrate homologue. 

45. A method according to any of claims 42 to 44 
wherein said phenotypic change to be screened is a 

io : 0 h t a uit y : cel1 growth - or shape ° r a — in ~» 

46. A method according to any of claims 42 to 44 
wherein said phenotypic change to be screened is a 
change in filopodia outgrowth, ruffling behaviour 
cell adhesion, contact inhibition or the length of 
neurite growth. 

47. a method as claimed in any of claims 42 to 
44 wherein said transgenic cell is an N4 neuroblastoma 
cell and the phenotypic change to the screened is the 
length of neurite growth. 

48. A method as claimed in any of claims 42 to 
44 wherein said transgenic cell is an MCF-7 breast 
carcinoma cell or an NIH3T3 cell and the phenotypic 
change to be screened is the extent of phagokinesis or 
contact inhibition. 

49. a method of determining whether a compound 
is an inhibitor or an enhancer of the regulation of 
cell shape, cell growth or motility or of the 
direction of cell migration, which method comprises 
administering said compound to a transgenic organism 
according to any of claims 24 to 28 or a mutant 
organism produced according to the method of claim 29 
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and screening for a phenotypic change in said 
organism. 

50. A method according to claim 49, wherein said 
5 method is capable of determining whether said compound 
is an inhibitor or enhancer of a protein of the signal 
transduction pathway of said transgenic or mutant 
organisms, of which the vertebrate homologue of UNC-53 
protein of SL^OsaanS or a functional equivalent, 
10 derivative, fragment or bioprecursor of said 
vertebrate homologue is a component, or is an 
inhibitor or an enhancer of a parallel or redundant 
signal transduction pathway in said cell. 

15 51. A method according to claim 50 wherein said 

method is capable of determining whether said compound 
is an inhibitor or an enhancer of the vertebrate 
homologue of UNC-53 protein itself or a functional 
equivalent, fragment, derivative or bioprecursor of 

20 said vertebrate homologue. 

52 A compound which is identifiable by the 
method according to any one of the claims 42 to 51 as 
an enhancer of the regulation of cell shape, or growth 
or motility or the direction of cell migration for use 
as a medicament for promoting neuronal regeneration, 
revascularisation or wound healing or for treatment of 
chronic neurodegenerative diseases or acute traumatic 
injuries or fibrotic disease. 

53 use of a compound which is identifiable by 
the method according to any one of the claims 42 to 51 
as an enhancer of the regulation of cell shape, or 
growth or motility or the direction of cell migration 
in the preparation of medicament for promoting 
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neuronal regeneration, revascularisation or wound 
healing or for treatment of chronic neurodegenerative 
diseases or acute traumatic injuries or fibrotic 
disease. 

54. A pharmaceutical composition comprising a 
compound identified according to the method of any of 
claims 42 to 51 claim and a pharmaceutical^ 
acceptable carrier, diluent or excipient therefor. 

55. A compound which is identifiable by the 
method according to any one of claims 42 to 51 as an 
inhibitor of the regulation of cell motility, growth, 
or shape, or the direction of cell migration, for use 
as a medicament for alleviating the spread of disease 
inducing cells or metastasis or loss of contact 
inhibition. 

56. use of a compound according to claim 55 in 
the manufacture of a medicament for alleviating the 
spread of disease inducing cells or metastasis or loss 
of contact inhibition. 

57. a pharmaceutical composition comprising the 
compound as claimed in claim 55, and a 
pharmaceutical^ acceptable carrier diluent or 
excipient therefor. 

58. a method of determining whether a compound 
is an inhibitor or an enhancer of transcription of a 
gene encoding a vertebrate homologue of UNC-53 protein 
of g. e l eqan? , which method comprises the steps of (a) 
contacting said compound with a cell according to any 
of claims 22 or 27 and (b) monitoring the level of 
said reporter molecule and comparing the results 



BNSDOCID: <WO 982481 0A2_I__> 



PCT7EP97/06956 
WO 98/24810 

- 194 - 



10 



15 



20 



25 



30 



35 



obtained from said monitoring step with a control 
comprising a cell according to claims 22 or 27, which 
cell has not been contacted with said compound. 

59. A method as claimed in claim 58 wherein said 
reporter molecule detected is mRNA or green 
fluorescent protein. 

60. A compound which is identifiable by the 
method according to claims 58 or 59, as an enhancer of 
transcription of a gene coding for a vertebrate 
homoloque of an UNC-53 protein of r, Regans or a 
functional fragment of said gene, for use in promonng 
neuronal regeneration, revascularisation or wound 
healing, or for treatment of chronic neuro- 
degenerative diseases or acute traumatic injuries or 
fibrotic disease. 

61. Use of a compound which is identifiable by 
the method of claims 58 or 59, as an enhancer of 
transcription of a gene coding for a vertebrate 
homologue of an UNC-53 protein of f, ftl*«nfl or a 
functional fragment of said gene, in the manufacture 
of a medicament for promoting neuronal regeneration, 
revascularisation or wound healing, or for treatment 
of chronic neurodegenerative diseases or acute 
traumatic injuries or fibrotic disease. 

62. A pharmaceutical composition which comprises 
the compound of claim 60 and a pharmaceutical ly 
acceptable carrier, diluent or excipient therefor. 

63. A compound which is identifiable by the 
method of claims 58 or 59 as an inhibitor of 
transcription of a gene coding for a vertebrate 
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homologue of a UNC-53 protein of C. «i ...... or a 

functional fragment of said gene for use in 
alleviating the spread of disease inducing cells or 
metastasis or loss of contact inhibition. 

64. use of a compound which is identifiable by 
the method of claims 58 or 59 as an inhibitor of 
transcription of a gene coding for a vertebrate 
homologue of an UNC-53 protein of C. eW a „, or a 
functional fragment of said gene, in the manufacture 
of a medicament for alleviating spread of disease 
inducing cells or metastasis or loss of contact 
inhibition. 

15 65. A pharmaceutical composition which comprises 

the compound of claim 63 and a pharmaceutically 
acceptable carrier, diluent or excipient therefor. 
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66. A kit for determining whether a compound is 
an enhancer or an inhibitor of the regulation of cell 
motility, growth or shape or the direction of cell 
migration which kit comprises at least one transgenic 
cell as claimed in any one of claims 22 to 25 to be 
contacted with said compound and at least one cell 
according to claims 21 to 28 to be used as a control 
and means for contacting said compound with one of 
said at least one transgenic cells. 

67. a kit for determining whether a compound is 
an inhibitor or an enhancer of transcription of a 
gene coding for a vertebrate homologue of an UNC-53 
Pr°tem of C g l ^n s or a functional fragment of said 
gene which kit comprises at least one cell as claimed 
in any one of claims 21 to 25 means for contacting 
said compound with said cells. 
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68 A kit for determining whether a compound is 
an enhancer or an inhibitor of the activity of a 
vertebrate homologue of an UNC-53 protem of 
r^nans or a functional equivalent, derxvatxve, 
S ta^eTor bioprecursor of S aid - = 

pro tein, which « ^cording to the 

mutant non-human organism 

lethod as clawed in claim 2. or a transonic organism 
Is claimed in claims 2< to 28 and a wild type of said 
10 vertebrate mutant organism. 

69 A method of identifying vertebrate 
homologues of an unc-53 gene of « a 

functional fragment thereof, which method comprises 
1S hybridising to a DNA library a =« ta " e 

oligonucleotide sequence of between 15 to 50 
nucleotides of the nucleic acid sequence encoding unc 
53 or a functional equivalent, derivative or 
bioprecursor thereof, under ^"^"tfally 
2 n stringency to identify genes having statistically 
" significant homology with the cDNA according to any of 
claims 10 to 14. 

,0. A method of identifying a protein which is 
2S active in the signa! transduction pathway of a cell of 
which a vertebrate homologue of an »C-» protein of 

or a functional equivalent, fragment or 
bioprecursor of said vertebrate homologue i. a 
component, which method comprises: 

(a) contacting an extract of said cell With an 
antibody to the vertebrate homologue of the 
imc-53 protein of £^lB3anS or a functional 
equivalent, fragment, derivative or bioprecursor 

of said protein, 
35 (b) identifying the antibody/vertebrate 
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homologue complex, and 

(c) analysing the complex to identify any 
protein bound to the vertebrate homologue o 
UNC-53 protein of C, elegan* other than the 
antibody. 



10 



15 



20 



71. A method of identifying a further protein 
which is active in the signal transduction pathway of 
a cell of which a vertebrate homologue of an UNC-53 
protein or a functional equivalent, fragment or 
bioprecursor of said UNC-53 protein is a component, 
which method comprises: 

(a) forming an antibody to the first 
identified protein bound to the vertebrate 
homologue of UNC-53 protein of C. eW.n. <„ 
claim 70, 

(b) contacting a cell extract with said 
antibody and identifying the 
antibody /protein complex, 

(c) analysing the complex to identify any 
further protein bound to the first protein 
other than the antibody, and 

(d) optionally repeating steps (a) to (c) 
to identify further proteins in said 

25 pathway. 

72. A method of identifying a protein which is 
active m the signal transduction pathway of a cell of 
which a vertebrate homologue of an UNC-53 protein of 
C ' or a functional equivalent, fragment or 

bioprecursor of said homologue protein is a component, 
which method comprises 

(a) contacting an extract of said cell with 
the vertebrate homologue of an UNC-53 
protein of c, el^ps or a functional 
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equivalent, derivative or bioprecursor of 
said vertebrate homologue, 

(b ) identifying any vertebrate homologue of 
UNC-53 protein/protein complex formed and 

(c) analysing the complex to identify any 
protein bound to the vertebrate homologue of 
UNC-53 protein other than the same 
vertebrate homologue of UNC-53 protein. 

73 . a method according to claim 72 which further 
comprises contacting a cell extract with any protein 
rifled from step (c) not being the same as the 
vertebrate homologue of UNC-53 protein used and 
repeating steps (b) and (c) so as to identify any 
further protein involved in the signal transduction 
pathway of said cell. 

74 A method of identifying a protein involved 
in the signal transduction pathway of a cell of which 
a vertebrate homologue of an UNC-53 protein of fi* 
is a component which method 

(a) providing an appropriate host cell 
having a DNA construct comprising a reporter 
gene under the control of a promoter 
regulated by a transcription factor having a 
DNA binding domain and an activating domain, 
(b ) expressing in said host cell a first 
hybrid DNA sequence encoding a first fusion 
of a fragment or all of a DNA sequence 
according to any of claims 10 to 14 and 
either said DNA binding domain or the 
activating domain of the transcription 
factor, 

(c) expressing in the host cell at least 
one second hybrid DNA sequence encoding a 



20 



25 



30 



35 



982481 0A2 I > 




WO 98/24810 



PCT/EP97/06956 



- 199 - 



10 



15 



20 



25 



30 



putative binding protein to be investigated 
together with the DNA binding or activating 
domain of the transcription factor which is 
not incorporated in the first fusion, 
(d) detecting any binding of the protein 
being investigated with a protein according 
to any of claims 1 to 9 by detecting for the 
production of any reporter gene product in 
said host cell. 



75. A protein identified by the method of any 
one of claims 7 0 to 74 for use as a medicament to 
promote neuronal regeneration, revascularisation or 
wound healing, or for treatment of chronic neuro- 
degenerative diseases or acute traumatic injuries or 
fibrotic disease. 

76. Use of a protein identified by the methods 
of any one of claims 70 to 74 in the manufacture of a 
medicament for promoting neuronal regeneration, 
revascularisation or wound healing, or for treatment 
of chronic neurodegenerative diseases or acute 
traumatic injuries or fibrotic disease. 

77. A pharmaceutical composition comprising a 
protein identified by the methods of any one of claims 
70 to 74 and a pharmaceut ically acceptable carrier, 
diluent, or excipient therefor. 

78. A process for producing a vertebrate 
homologue of an UNC-53 protein of C. eleaans or a 
functional equivalent fragment, derivative or 
bioprecursor of said vertebrate homologue which 
process comprises culturing the cells of any of claims 
21 to 28 and recovering said vertebrate homologue of 
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UNC-53 protein expressed . 

79. A process for producing a vertebrate 
homologue of an UNC-53 protein of C. elegans or a 
5 functional equivalent, fragment, derivative or 

bioprecursor of said protein which process comprises 
culturing an insect cell transfected with a 
recombinant Baculovirus vector, said vector comprising 
a DNA insert encoding said vertebrate homologue of 
10 UNC-53 protein or a functional equivalent, fragment or 
bioprecursor of said vertebrate homologue, downstream 

r-N Da^jil rttri vne r\ 1 x t Vt o r\ v- -i ft nrnwnf oy anH yDrn^/OY* 1 T\C1 

the expressed vertebrate homologue of UNC-53 protein. 

15 80. A nucleotide sequence comprising the 

sequence as shown in figure 15. 

81. A nucleotide sequence comprising the 
sequence as shown in figure 16. 

20 

82. A nucleotide sequence comprising the 
sequence as shown in figure 17. 

83. A method of detecting whether a compound is 

2 5 an inhibitor or an enhancer of expression of a 

vertebrate homologue of an UNC-53 of C- elegans. or a 
functional equivalent, derivative or fragment of said 
vertebrate homologue which method comprises contacting 
a cell expressing said homologue with said compound 

3 0 and monitoring for a phenotypic change compared to a 

control cell which has not been contacted with said 
compound • 

84. A method according to claim 83 wherein said 
35 cell comprises a cell according to any of claims 21 to 
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85. A method according to claim 83 wherein said 
cell has undergone loss of contact inhibition. 

86. A method according to any of claims 83 to 85 
which is capable of determining whether said compound 
is an inhibitor of expression of said vertebrate 
homologue in which the compound to be tested comprises 
a nucleic acid. 

87. A method according to claim 86 wherein said 
nucleic acid sequence comprises an antisense DNA or 
RNA sequence. 

88- A method according to claim 87 wherein said 
mRNA sequence comprises 3 1 untranslated regions of 
mRNA encoding for said vertebrate homologue. 

89. A method according to any of claims 83 to 85 
wherein said compound to be tested comprises a protein 
having an amino acid sequence potentially suitable for 
inhibiting function of said vertebrate homologue. 

90. A method according to claim 89 wherein said 
protein comprises a protein identified according to 
any of the methods of claims 70 to 74. 

91. A pharmaceutical composition comprising a 
compound identified according to any of claims 83 to 
89 together with a pharmaceutically acceptable 
carrier, diluent or excipient therefor. 

92. A nucleic acid sequence identified according 
to the method of any of claims 86 to 88 for use in 
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treatment: of loss of contact inhibition or carcinoma 
which is mediated by a vertebrate homologue of an 
UNC-53 protein of eleaans or a functional 
equivalent, fragment, derivative or bioprecursor 
thereof . 

93. Use of a nucleotide sequence identified 
according to the method of any one of claims 86 to 88 
in the preparation of a medicament for the treatment 
of loss of contact inhibition or carcinoma which is 
mediated by a vertebrate homologue of an UNC-53 
protein of C, elegans or a functional equivalent, 
fragment, derivative or bioprecursor of said 
vertebrate homologue. 

94. A nucleic acid according to claim 92 for use 
in the preparation of a medicament for inhibiting 
expression of a gene coding for a vertebrate homologue 
of an unc-53 protein of c, elegans. 

95. A NIH3T3 cell line transfected with pcB201 
and deposited under LMBP Accession No. 1603CB. 

96. A plasmid pCB 201 of Sequence ID No. 10 
deposited under LMBP Accession No. LMBP 3594. 

97. A MCF-7 cell line transfected with plasmid 
pCB 2 01 deposited under LMBP Accession No. LMBP 
1601CB. 

98. An assay for detecting expression of a 
vertebrate homologue of UNC-53 protein of C. eleaans 
in a vertebrate cell which assay comprises contacting 
a cell or an extract thereof with an antibody to said 
vertebrate homologue, or a functional equivalent, 



BNSDOCID: <WO 982481 0A2J_> 




WO 98/24810 



PCT7EF97/06956 



203 - 



10 



15 



20 



25 



30 



derivative or bioprecursor thereof, which antibody is 
linked to a reporter molecule, removing any unbound 
antibody and monitoring for the presence of said 
reporter molecule . 

99. An assay according to claim 98 wherein said 
reporter molecule is an antibody conjugated with a 
suitable fluorophore or detectable enzyme. 

100. A method for detecting for expression of a 
gene coding for a vertebrate homologue of an UNC-53 
protein of c. eleaans or a functional equivalent, 
derivative, fragment or bioprecursor thereof, which 
method comprises contacting a probe specific for a 
nucleic acid or protein sequence coding for or 
corresponding to said vertebrate homologue or a 
functional equivalent, fragment or bioprecursor 
therefor with a cell extract which probe is linked to 
a reporter and analysing for the presence of said 
reporter . 

101. A method according to claim 100 wherein 
said probe comprises a complimentary sequence to a 
region of mRNA transcribed from said gene encoding 
said vertebrate homologue of UNC-53 protein or a 
functional equivalent, derivative or bioprecursor 
therefor. 

102. A method according to claim 101 wherein 
said complimentary sequence is a 3 • or 5' untranslated 
region of said mRNA. 

103. A method according to claims 100 or 102 
wherein said reporter comprises a radiolabel. 
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104. A method according to claim 100 wherein 
said probe comprises an antibody specific for said 
vertebrate homologue of said UNC-53 protein or a 
functional equivalent, derivative, fragment or 
bioprecursor therefor. 

105. A method according to claim 104 wherein 
said reporter comprises an antibody conjugated with a 
detectable fluorophore or enzyme. 

106. Phage Lambda clone 3b of Sequence ID No. 5 

107. A method of determining whether a compound 
is an inhibitor or an enhancer of association of UNC- 
53 or a vertebrate homologue thereof according to any 
of claims to 1 to 9 to microtubules or plus end 
regions thereof, which method comprises:" 



(a) contacting said compound with a 
transgenic cell, tissue or organism 
expressing UNC-53 protein or said vertebrate 
homologue and which protein is operably 
linked to a reporter molecule. 

(b) screening for the localisation of said 
reporter molecule as compared to a cell 
according to step (a) which has not been 
contacted with said compound. 



108. A compound identifiable by the method 
according to claim 107. 

109. A compound identifiable by the method 
according to claim 107 as an inhibitor of localisation 
or association of UNC-53 or said vertebrate homologue 
with microtubules or the plus end region thereof for 
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use in alleviating the spread of disease inducing 
cells or metastasis or loss of contact inhibition. 

110. A compound identifiable by the method 
according to claim 107 as an enhancer of association 
of UNC-53 or said vertebrate homologue with 
microtubules or the plus end region thereof, for use 
in promoting neuronal regeneration, revascularisation 
or wound healing, or for treating chronic 
neurodegenerative diseases or acute traumatic injuries 
or fibrotic disease. 

111. A pharmaceutical composition comprising the 
compound according to claims 108 or 109 and a 
pharmaceutically acceptable carrier, diluent or 
excipient therefor . 

112. A kit for determining whether a compound is 
an inhibitor or an enhancer of association of UNC-53 
or a vertebrate homologue thereof according to any of 
claims 1 to 9 with microtubules or the plus end 
regions thereof, which kit comprises at least one 
transgenic cell expressing UNC-53 and a reporter 
molecule or a cell according to any of claims 20 to 24 
and at least one cell of the same cell type for use as 
a control and means for contacting said compound with 
one of said at least one transgenic cells. 

113. A composition comprising UNC-53 of 
elegans or a vertebrate homologue thereof according to 
any of claims 1 to 9 linked to a compound identified 
as an inhibitor or enhancer of association of UNC-53 
or said vertebrate homologue with microtubules or 
their plus end regions for use in targeting said 
compound to said microtubule or the plus end regions 
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thereof . 

114. A composition according to claim 113 which 
further comprises a cell transformation or 
transfecting agent. 

115. A method of targeting a protein to a cell 
microtubule or the plus end region thereof, which 
method comprises introducing into a host cell, tissue 
or organism a transgene comprising a sequence capable 
of expressing UNC-53 or a vertebrate homologue thereof 
according to any of claims 1 to 9, which sequence is 
operably linked to a sequence encoding said protein to 
be targeted such that a chimeric protein is expressed 
and which results in targeting said protein to said 
microtubule or a plus end region thereof. 

116. A method of identifying a molecule which 
covalently modifies UNC-53 or a vertebrate homologue 
thereof according to any of claims 1 to 9, which 
method comprises 

a) contacting either an extract from a cell 
expressing UNC-53 or said vertebrate homologue or a 
mixture of enzymes comprising canditate UNC-53 
modifying enzymes in the presence of an indicator of 
covalent modification of a protein, 

b) identifying any covalently modified UNC-53 
protein from step a) , 

c) identifying said molecule involved in said 
modif ication step . 



117. A method according to claim 112, wherein 
said indicator comprises -*-p. 
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118* A method of identifying a compound which 
alleviates or enhances the toxicity of UNC-53 or a 
vertebrate homologue thereof according to any of 
claims 1 to 9, which method comprises contacting said 
5 compound with a cell, tissue or organism according to 
claim 27, and monitoring for the presence of said 
reporter molecule adjacent said microtubules or the 
plus end regions thereof. 

10 119. Plasmid pLMl of Sequence ID No. 6 deposited 

under Accession No. LMBP 3762. 

120. Plasmid pLM4 of Sequence ID No. 7 deposited 
under Accession No. LMBP 3763. 

15 

121. Plasmid pEGF72 of Sequence ID No. 8 
deposited under Accession No. LMBP 3764. 

122. Plasmid pCBSOl of Sequence ID No. 9 
20 deposited under LMBP Accession No. LMBP 3765. 

123. A worm strain comprising a chimeric 
C.elegans human unc-53 gene deposited under LMBP 
Accession No. LMBP-1663CB. 

25 

124 . A vertebrate homologue according to any of 
claims 1 to 3 which is a mouse homologue. 

125. A homologue according to claim 125 having 
30 the sequence illustrated in Figure 14. 



35 
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fig 30 pEGFP72 (1 > 9697) Site and Sequence 
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ACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGAC TCAAGACGATAGTTACCGGATAAGGCGC4 

' » • 1 • ' • 1 ■ ' ■ » • ' ■ •• ■ tco: 

TGGAGCGAGACGATTAGGACAATGGTCACCGACGACGGTCACCGCTATTCAGCACAGAATGGCCCAACCTGAGTTCTGCTATCAATGGCCTATTCCGCGT 
PRSANPVTSGCCQVR VVSYRVGLK T I VTG G A 

ApaL I 
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GCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCC 
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CGCCAGCCCGACTTGCCCCCCAAGCAC6TGTGTCGGG TCGAACCTCGCTTGCTGGATGTGGC TTGACTCTATGGATGTCGCACTCGATACTCTTTCGCGS 
AVGLNGGFVHTAQLGANOLHRTE I P T A A M R K R 

ACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATC ' 

, 1 , ) , i 1 1 1 1 1 1 1 1 1 1 1 1 1 1- DcCC 

TGCGAAGGGCTTCCCTCTTTCCGCCTGTCCATAGGCCATTCGCCGTCCCAGCCTTGTCCTCTCGCGTGCTCCCTCGAAGGTCCCCCTTTGCGGACCATA3 

HASRREKGG QVSGKRQGRNRRAHEGASRGKRLV3 

TTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGC 

1 1 ■ ■ . ■ t 1— 1 . 1 ■ i » 1 1 1 ' ' 1 ' ' 1 ' 1 1 (- 56CC 

AAATATCAGGACAGCCCAAAGCGGTGGAGACTGAACTCGCAGCTAAAAACACTACGAGCAGTCCCCCCGCCTCGGATACCTTTTTGCGGTCGTTGCGCCG 

L SCRVSPPLT .ASIFVMLVRGAEPMEKRQQRG 
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CTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCATGCAT 

— .i.i 1 1 . 1 1 1 . 1 1 I > I 1 1 — 96 £7 

GAAAAATGCCAAGGACCGGAAAACGACCGGAAAACGAGTG TACAAGAAAGGACGCAATAGGGGACTAA6ACACCTATTGGC ATAATGGCGGTACGTA 

LFTVPGLLLAFCSHVLSCVIP F C G . PYYRHA 
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TAGTTATTAATAGTAATCAATTACGGGG TCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTA CGGTAAATGoCCCGCCTGGC TGACCIi 
A "TC A ATA AT TATC AT TAGTTAATGCCCCAGTAATCAAGTATCGGGTATATACCTCAAGGCGC AATGT ATTGAATGCCATTTACCGGGCGGACCG AC TGGL* 



L L I V t NYGVISS.PIY6VPRYITYGKVPA 



V L T 



CCCAACGACCCCCGCCCATTGACGTC AATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATT GACGTCAATGGG TGGAGTATTTACGG~ 
GGGTTGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCATTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTCATAAATGCrA ^ 
A Q R P P P I D V N H D V C S H S N A N R D FPLTSMGGVFTV 

AAACTGCCCACTTGGC AGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCG CCTGGCATTATGCCC agta 
TTTGACGGGTGAACCGTCA TGTAGTTCACATAGTATACGGTTCATGCGGGGGATAACTGCAGTTACTGCCATT TACCGGGCGGACCGTAATACGGG TC£~ 
N , C P , L G S , T S S V S YAK Y A P Y . R Q . P. MARL A L C P V " 

CATGACCTTATGGGACTTTCC TACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATG GGCGTGGA 
GTACTGGAATACCCTGAAAGGATGAACCGTCATGTAGAT6CATAATCAGTAGCGATAATGGTACCACTAC6CCAAAACCGTCATGTAGTTACCC6CACCT "°° 
H D L M G L 5 Y L A V H L R FSHRYY HGD AVLAVHOWAW 

TAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTrTGGCACCAAA ATCAACGGGACTTTCCAAAATGTCGTA 

ATCGCCAAACT6AGTGCCCCTAAAGGTTCAGAGGTGGGGTAACTGCAGTTACCCTCAAACAAAACCGTGGTTTTAGTTGCCCTGAAAGGTTTTACAGCA- * C ° 
1 A V ; L T G , 1 S K S P P H . R Q V E F V L A P KSTGLSKM S 

AC AAC TCCGCCCCATTGACGCAAATGGGCGGTAGGCGTG TACGGTGGGAGGTC TATATAAGCAGAGCTGGTTTAGTGAACCG TCAGATCCGCTAGCGCTA 

tgttgaggcggggtaac tgcgtttacccgccatccgcacatgccaccctccagatatattcgtctcgaccaaa tcacttggcagtctaggcgatcgcga' 630 

0 > R . P 1 0 A N G R . A C T V G G L Y K Q S V F SEPSDPLAL 

ccggtcgccaccatggtgagcaagggcgaggagctgttcaccggggtggtgcccatcctggtcgagctggacggcgacgtaaa cggccacaagttcagc^ 
ggccagcggtggtaccactcgttcccgctcctcgacaagtggccccaccacgggtaggaccagctcgacctgccgctgcatttgccggtgttcaagtcgc 7C: ' 



— — eQFPC e uncSasma " 

P V A T " V 3 K G E E L F TGVVPILVEL0GDVNGHKF3 



TGT CCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCC GTGC 

acaggccgctcccgctcccgctacggtggatgccgttcgactgggacttcaagtagacgtggtggccgttcgacgggcacgggaccgggtgggagcact^ 



~" "~ ~-eGFPC.e.uric50fiirja — 

V 3 G E G £ G , 0 A T Y G K L T L K F 1 C T T G K L P V P W P T L V T 

caccc tgacctacggcgtgcagtgcttc agccgc taccccgaccacatgaagcagcacgacttcttcaagtccgccatg cccgaaggctacgtccagga^ 
gtgggactggatgccgcacgtcacgaagtcggcgatggggctggtgtacttcgtcgtgctgaagaagttcaggcggtacgggcttccgatgcaggtcct: 



• eGFPC e.uric53sma 



TLTy GV QCFSRYPOHM 



KQHOFFKSAMPEGYVOE 



cgcaccatcttcttcaaggacgacggcaactacaagacccgcgccgaggtgaagttcgagggcgacaccctggtg aaccgcatcgagctgaaguGca-c: 
gcg tggtagaagaagttcc tgctgccgttgatgttctgggcgcggc tccac ttcaagc tcccgctgtgggacc act tggcgtagc tcgact tcccgtagz 



- eGFFC.9.jnc53sma 



* 7 f , F F :< 0 0 G Y K T R A £ VKFEGDTLVHRIELKGI 
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AC T tc aaggaggacggcaacatcctggggcacaagctggagtacaactac aacagccacaacgtctata tcatggc cgacaagcagaagaacggcatcaa 
tgaagttcctcctgccgttoTaggaccccgtgttcgacctcatgttgatg ttgtcggtgttgcagatatag taccggctgttcgtcttcttgccgtagti" :,C ' : 



• eGFFC.s.unc53sma 



OPKEDGMILGHKLEYNYNSHNV 



Y JMAOKQKNG 



I \ 



GGTGAAC TTCAA6ATCCGCCACAACATCGAGGAC 



GGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCG 



TGCTGCTG 



CCACTrGAAGTTCTAGGCGGTGTTGTAGCTCCTGCCGTCGCACGTCGACCGGCTGGTGATGGTCGTCTTGTGGGGGTAGCCGCTGCCGGfifir 



ACGACGAC 



™ eGFPC Alin^cnm 

V N F , K ' R , H N ' E P G S V Q L A OHYQQNTP IGDGPVLL 



CCCGACAACCAC 



TACCTGAGCACCC AGTCCGCCCTGAGCAAAGACCCC AACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCG 



TGACCGCCGCCGGGA 



GGGCTGTTGGTGATGGACTCGTGGGTCAGGCGGGACTCGTTTCTGGGGTTGCTCTTCGCGCTAGTGTACCAGGACGACCTCAAGCACTGGCGGCGGCCCT 



too: 



• eGFPC.e.uncSOsroa 



PDNHYLSTQSALSKDPNEKR 



dhmvllefvtaa 



TCACTCTCGGCATGGACGAGC TGTACAAGTCCGGACTCAGATCTACGTCAAATGTAGAATTGATACCAATCTACACGGATTGGGCCAAT CGGCACCTTTC 

*ctgagagccgtacctgctcgacatgttcaggcctgagtc^ 



■ eGFPC. e.unc53sma 



[ tlgmdelyksglrstsn 



-C.e.unc53 sma 



VELtPlYTOVANRH 



L 3 



GAAGGGCAGCTTATCAAAG ^GGATfAGGGATATTTCC AArGATTTTCGCGACTATCGACTGGTTTCTCAGCTTATTAATGTGATCGTTCCGATC AACGAA 

cttcccgtcgaatagtttcagctaatccctat^^^ 



-C.e.unc53 sma 



K G S I S K 3 I 3 D I 5NDFR0YRLVSQL I M V I 



V ? I M E 



ttctcgcctgcat tcacgaaacgtttggcaaaaatcacatcgaacctggatggcctcgmaacgtgtctcgactacctgaaaaa 
aagagcggacgtaagtgctttgcaaaccgtttttagtgtagcttggacctaccggagctttgcacagagctgatggactttttagacccagagc TGAdil 



— ' — — C.e.unc53 sma • — 

F S P . A F T K R 1 *. K ' T S N L D G I E T C L D Y L K N L G L D C 

CGAAACTCACCAA AACCGATATCGACAGCGGAAACTTGGGTGCAGTTCTCCAGCTGCTCTTCCTGCTCTCCACCTACAAGCA^ 

GCTTTGAGTGGTTTTGGCTATAGCTG ^CGCCTTTGAACCCACGTCAAGA GGTCGACGAGAAGGACGAGAGGTGGATGTTCGTCTTCGAaGCCGTTGACTT 




■ C.e.unc53 sma 



3*:ltk tdids 



G N ' l gavlqllfllst yko 



K L o 0 L ^ 
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AAAAGATCAGAAGAAATTGGAGCAACTACCCACATCCATTATGCCACCCGCGGTTTCTAAArTACCCTCGCCACGTGTCGCCACGTCAGCAACCGCrrr^ 



TTTTCTAGTCTTCTTTAACCTCGTTGATGGGTGTAGGTAATACGGTGGGCGCCAA AG ATTTAATGGGAGCGGTGCACAGCGGTGC AGTCGTTGGCGAAGT 

- 6GFFC.e.uncS3sfna 



I tie 




" C.e.unc53 sma ■ 

K 0 Q K K L E Q L P T S iMPPAVSKLPSPRv 



A T S A T A S 



GCAACTAACCCAAATTCCAACTTTCCACAAArGTCAACATCCAGGCTTCAGACTCCACAGTCAAGAATATCGAAAATTGATTCATCAAA GATTGG 
CGTTGATTGGGTTTAAGGTTGAAAGGTGTTTACAGTTGTAGGTCCGAAGTCrGAGGTGTCAGTTCTTATAGCTTTTAACTAAGTAGTTTCTAACCAT^^ ,so 




- C.e.unc53 sma 



A TNPNSNFPQMSTSRLQTPQSR I S K I 0 S S K I 
AGCCAAAGACGTCTGGACTTAAACCACCCTCATCATCAACCACTTCATCAAATAATACAAATTCATTCCGTCCGTCGAGCCGTTC 



G I 



TCGGTTTCTGCAGACCTGAATTTGGTGGGAGTAGTAGTTGGTGAAGTAGTTTATTATGTTTAAGTAAGGCAGGCAGCTCGGCAAGCTrArrn 



GAGTGGC AATAATAA 



TTATTATT 



~* 20CC 




KP KTSGL Kp PSS STTSSNNTNSFRPSSR 



S S G N N N 



TGTTGGCTCGAC 



GATATCCACATCTGCGAAGAGCTTAGAATCATCATCAACGTACAGCTCTATTTCGAATC 



AC AACCGAGCTGCTATAGG TGTAGACGC ttctcgaatcttag tagtag 



TAAACCGACCTACC TCCCAACTCCAAAAA 



ttgcatgtcgagataaagcttagatttggctggatggagggttgaggt 



TTTT 



h- 2 ICC 




" C.e.uncS3 sma 

V G 3 T ISTSAKSLESS 



STYSSISNLNRPTS 



Q L Q K 



CCT TQ TAGACCACAAACCCAGCTAGTTCGTGTTGC T AC A AC TACAAAAATCGGAAGC TCAAAGC TAGCCGCTCCGAAAGCCG TGAGC ACCC C AAAAC"TU 
GGAAGATCTGGTGTTTGGG ^CGATCAAGCACAACGArGT TGAfGTTTTTAGCCTTCGAGTTTCGATCGGCGAGGCTTTCGGCAC TCG TGGGGTTTTf; J i£'' 



• eGFPC c. ;inc53snia 



p SRPQTQLVR 



- C.e.unc53 sma 



VATTTKIGSSKL 



AAPKAVSTPk'L 



C ^^ ^y^^GAAGACTATTGGAGCAAAACAAGAGCCCGATAACAGCGGTGGTGGTGGTGGTGGAATGC TGAAATTAAAGTTATTC AGTAGCAAAAACCC ATw 
GAAGACACTTC7GATAACCTCGTTTTGTTCTCGGGCTATTGTCGCCACCACCACCACC ACCTTACGACTTTAA TTTCAATAAG TCATCG TTTTTGGGTAn 



- eGFPC.e.iiMc53sma 



-C.e.unc53 sma 
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"CCTCA 



TCGAATAGCCCACAACCTACGAGAAAGGCGGCGGCGGTGCCTC AACAACAAAC 



tttgtcgaaaatcgctgccccagtgaaaagtggcctgaag 



A ^GAGTAGCTTATCGGGTGTTGGATGCTCTTTCCGCCGCCGCCACGGAGTTGTTGTTTGAA ACAGCTTTTAGCGACGGGGrCACrTTTr;; 

* 6<3FFC.e.unc53sma 



iCCGGACTTC 



-C.e.unc53 sma 



SSSNSPQP TRKAA 



AVPQQQTLSKIAA 



P V K S G L ^ 



CCGCCGACCAGTAAGCTGGGAAG 



TGCCACGTCTATGTCGAAGCTTTGTACGCCAAAAGTTTCCTACCGTAAAACGGACGCCCC 



AA TC AT ATC TC AACAAS 



ggcggctggtcattcgacccttcacggtgcagatacagcttcgaaacatgcggttttcaaTggatggcattttgcctgcgggg tta^ 

- eG FPC. e. uncSSsma 




a T «KrOAPIlSQO 



actcgaaacgatgct 



:tcaaagagcagtgaagaagagtccggatacgctggattcaacagcacgtcgccaacgtcatca tcgacggaaggttccctaa^^ 

TGAGCTTTGCTACGAGTTTCTCGTCACTTCTTCTCAGGCCTATGCGACCTAAGTTGTCGTGCAGCGGTTGCAG TAGTAGCTGCCTTCrAA^nriA ttti-tI 



■cGFPC gunc53sma 



DSKRCSKSSEEES 



-C.e.unc53 sma 



G y A G F N S T SPTSSSTECSLSH 



GCATTCCACATC7 



, ZTTCCAAGAGTTCAACGT ^ 

£'-AGGTGTAGAAGGTTCTCAAGTTGCAG^^ 



27CC 




>• S T S S K S 5 T SOEKSPSSDDLTLN 



A S IVTA1RQP 



A- a GCCGCAACACCGGTTTCTCCAAATATTATCA ACAA G CCTGTTGAGGAAAAACCAACACT G GCAGT G AAAGGACTG AA,AGCACAG CG , flAAa , r „ T , 
^-CG 5 CGTTGTGGCCAAAGAGGTTTArAATAGTTGTTCGGACAACTCCTrTTTG G TTr.:n a rr.r r . r ; TT ^^.^^ rrrTrTC:CTTTTTKT ^ 




! AATPVSPN 



I INKPVEEKPT 



LAVKGVKST 



C,:CrCCAGCTGTrcc G CCACGTGACACCCAGCCAA CAATCGGAGTTGTTAGTCCAATTATGGCACATAA G AAGrT GACAAAT G ACCCC G rGATAT C r^ 
-'■iAGGTCGACAAGGCGGTGCACTGTGGGTCGGTTGTTAGCCTCAACAATCAGGTTAATACCGTGTATTCTTrflArTKTTrArT/Trr-r.-*!- ..1 * 




F 3 P * v P P R0TQPTI GVVSP , M 



*HKKLTNDP /ISE 
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AAAACCAGAACC ^ A ^^ A ^^^^^ATCAATGAGCATCGACACGACGGACG rTCCACCGCTTCCACCTCTAAAATCAGTTG TTCCACTTAAAATGAC TTCA 
— TTGGTCTTGGACTTTTCGAGGTTAGTTACTCGTAGCTGTGCTGCCTGCAAGGTGGCGAAGGTGGAGATrTTAGTCAAC 



A AGG T G A A T T T T AC TG A A G * 



2CC< 



-eGFPC.a.unc53sma 



-C.e.unc53 sma 



K P E . P E K > Q S " S I D, T. T 0 V P P L p p L K s V V P L K « T 
ATCCG ACAACCACCAACGTACGATGT-TCTTCTAAAACAAGGAAAAATCACATCGCCTGTCAAGTCGTTTGGATATGAGCAGTCGTCCGCGTC 



TG A AG ACT 



TAGGCTGTTGGTGGTTGCATGCTACAAGAAG ATTTTGTTCCTTTTTAGTGTAGCGGACAGTTCAGCAAACCTATACTCGTCAGCAGGCGCAG ACTTCTGA 

- eG FPC. e. unc53sm a 



3K 




IRQPPTYDVLLKQ 



■ G * 1 T S P V K S F GYCQSSaSEO 



CCATTGTGGCTCATGCGTCGGCTCAGGTGACTCCGCCGACAAAAAC 



GG 



TTCTGGTAATCATTCGCTGGAGAGAAGGATGGGAAAGAATAAGACA 



TCAGAATC 



TAACACCGAGTACGCAGCCGAGTCCACTGAGGCGGCTGTl^TTGAAGACCATTAGTAAGCGACCTCTCTTCCTACCCTTTCTTATTCTGTAc; TrTT^r: 



32 (X 



* (iGFPC.c.unc53sfn?. 



-C.e.unc53 sma 



3 ' V A " * S - A ° V T P P T " * S, 6 N H S L E R R M G K N K T S E -3 

ca gcggctacacctctgacgccggtgttgcgatgtgcgccaaaatgagggagaagctgaaagaatacgatgacatgactcgtcgagcacagaacg gctat 

GTCGCCGATGTGGAGACTGCGGCCACAACGC TACACGCGGTTTTACT CCC ^CTTCGA CTTTCTTATGCTACTGTACTGAGCAGC TCGTGTCTTGCCGATA 

- e-GFFC. e. unc53srna 




■C.e.unc53 sma 



g ytsoagv 



amcakmreklke 



YDOMTRRAONG 



CrrGACAACTTCGAAGACAGTTCCTCCTTGTCGTCTGGAATArCCGATAACAACGAGCTCGACGACATATC CACGGA 
GGACTGTTGAAGCrTCTGTCAAGGAGGAACAGCAGACCTlATAGGCTATlGTTGCTCGAGCTGCTGTATAGGTGCCTGclAAACAGGCr 



CGATTTGTCCGGAGTAGACATG j 



TCATC TG TACw 




J DNFEDSSS 



L , S S G , 1 S 0 " N E L D 0 iSTDDLSGVOtf 



CAACAGTCGCCTCC AAACATAGCGAC ^^^GTTCGCCATCCCACGTCTTCTTCCTCAAAGCCCCGAGTCCCC AGTCGGTCCTCCACATCAG r 

*" G ^ A ^ GG * G ^ ^TGTATCGCTGATAaGGGTGAAACAAGCGGTAGGG TGCAGAAGAAGGAGTTTCGGGGCTCAGGGGTCAGCCAGGAGGTGTAGTrl 



ArvA SKHS0YSH 



■ C.e.unc53 sma 



FVRHPTSSSSKP 



Rvpsrss rs 
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Paget 



' CGATTCTCG fCTCGAGCAGAACAGGAGAATG^^ 
GCTAAGAGCTAG AGCTCGTCTTGTCCTC^ 

- eG FFC . e. unc53sm a 



TCGGACAA 
AAGCCTGTT 



■C.e.unc53 sma 



DSRSRAEQENV 



YKLLSQCRTS QRG 



AA ATSTFG0 



CATTCGC1 



:TAAGATCCCCGGGATCCACCGGA TCTAGATAACTGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACT ^ 
GTAAGCGATTCTAGGGGCCCTAGGTGGCCTAGATCTAT TGACTAGTATTAGTCGGTATGGTG TAAAC ATCTCCAAAATGAACGAAATTTTTTnnAnnnT^ 37CC 



-9GFPC.s.«jnc53s/na 



C.e.unc53 sma 

H SLRSPGSTGSR 



L I I ISHTTFVEVLL 



A L K N U P 



ACC 



TCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAATGG 



TGGAGGGGGACTTGGACTTTGTATTTTACTTACGTTAACAACAACA- 

H L P L N L K H K M NA I VVVNLF I A A Y N 



TTACAAATAAAGCAATAG CATCACAAA 
ATTGAACAAATAACGTCGAATATTACCAATGTTTATTTCGTTATCGTAGTGTTT 



3<3CC 



G Y K 



S N S ! T N 



^ T ^^ AAA T AAAG ^ A ^TTTTTCAC ^ G ^ATTCTAGTTG TGGTTTG TCCAAACTCATCAATG TATCTTAACGCGTAAATTGTAAGCGTTAATATTTTftTT 



AAAGTGTTTATTTCGTAAAAAAAGTGACGTAAGATCAACACCAAAC AGGTTTGAGTAGTTACATAGAATTGCGCATTTAACAT TCGCAATTATAAAACAA 
F T N K A F F S L H SSCGLSKLINVS.R 



V H C K R 



Y F V 



AAAArTCGCGTTAAATrTTTGTTAAATCAGCTCA TTrTTTAACCAATA G GCCGAAATCGGCAA.ATCCCTTATAAATCAA AAGAArA G ACC G A fiaTA r.r., 

TTTTAAGCGCAATTTAAAAACAATTTAGTCGAGTAAAAAATTGGTTATCCGGCTTTAGCCGTTTTAG GGAATArTTAGTTTTCTTATCTGGCTCTATCCC 
K ■' " " F L L " ° L ' F • " ' ° " " » 0 N PL . I K R I 0 R o p 



TAC 



TTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCA CTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACC GTCTATCAGGGCGATGGrrr Ar 

AACTC ACAACAAGGTCAAACCTTGTTCTCAGGTGATAATTTCTTGCACCTGAGGTTGCAGTTTCCCGCTTTTTGGCAGATAGTCCCGCTACCGGGTGATG 
V ' ' S 5 L - £ ° E S T 1 * E R G L Q R Q R A K N R L S G R w P T T 

GTGAACCATCACCCTAATCAAGTTTTTTGGGGTCGAGGTnrrrTA. 



> A AG C A C T A A A TC 



GGAACCCTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAA 



CACTTGGTAGTGGGATTAGTTCAAAAAACCCCAGCTCCACGGCATTTCGTGATTiAGCcirGGGATTTCCCTCGGGGGci 



T I T L I K F F G 



AAATC TCGAAC TGCCCCTTT 



V E V P 



S T K S E P 



R £ P P I 



S L T G K 



GCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAA AGCGAAAGGAGCGGGCGCTAGGGCGCTGGCAAGTGTAGCG GTCACGCTGCGCGTAACC ACCAC Aff r 
CGGCCGCTTGCACCGCTCTTTCCTTCCCTTCTTTCGCTTlcCTCGCCCGCGATCCCGCG^CGTKACAlcGCCAGTGciACGCGCATTGG 



AGERQEKGR 



EESERSGR 



TGGTGTGGG 

GA GKCSGHAARNHHT 



GCCGCGC TTAATGCGCCGC 



TACAGGGCGCGTCAGGTGGCACTTTTC GGGGAAATGTGCGCGGAACCCCTAT 



cggcgcgaattacgcggcgatgtcccgcgcagtccaccgtgaaaagcccctttacacgcgccttggggaIaaacaaataaaaag; 



TTGTTTATTTTTCTAAATACATTCAAATA 



R R A 



C AATGRVRVH 



TTTATGTAAGTTTAT 



TGTATCCGCTCAT G AGACAATAACCCT G ATAAAT G CTTCAATAATATTGAAAAAGGAAGAGTCCTGAGGCGGAA AGAACCAGCTGTGr.,.r,T,Tr.rr,r. 
ACArAGGCGAGTACTCTGTTATTGGGAC TATTTACGAAGTTATTATAACTTTTTCCT7CTCAGGACTCCGCCTTTCT TGG rCGACACCTTA''*ACACAGTC ^ 
" S A H E T I T L I N A S I ! L K K E E S 



G G K M Q L W N >/ C 0 
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FTAGGGTGTGGAAAGT 



jTCCCCAGGCrCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAA AG TCCCCAGGCTCCC 
AATCCCACACCTTTCAGGGGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACG TAGAGTTAATCAGTCGTTGGTCCACACCTTTCAGGGGTCCGAGGG 
L G C G K S P G S P A G R S M Q S M H L H . S A X R C G K S P G S ^ 

CAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCC 
^CGTCCGTCTTCATACGTTTCGTACGTAGA^ 

PAGRSMQSMHLN . SAT 1 V P P L TPP IPPL TPPSSA 



CCATTCTCCGCCCCATGGC TGACTAATTTTTTTTATTTA TGCAGAGG CCGAGGCCGCCTCGGCCTC 
GGTAAGAGGCGGGGTACCGAC TG ATTAAAAAAAATAAATACGTC TCCGGCT 



TGAGCTATTCCAGAAGTAG TGAGGAGGCTTTTTT 



:TCCGGCGGAGCCGGAGACTCGATAAGGTCTTCATCACTCCrCCGAAAAAA 
H S P P H G * L I F F I Y A E A E A „ . M a t , h n „ G G F F 



ASASELFQK 



GGAGGCCTAGGCTTTTGCAAAGATCGATCA AGAGACAGGATGAGGATCGTTTCGCATGATTGAACAAGATGG^ 

CCTCCGGATCCGAAAACGTTTCTAGCTAGTTCTCTGTCCTACTCCTAGCAAAGCGTACTAACTTGTTCTACCTAACGTGCGTCCAAGAGGCC 
G G L G F ° « ° R S R D ft " R ' * S H D TRWIARRFSGRL 



GTGGAGAGGCTAT TCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGCCGTGTTCCGGCTGTCAGCGCAGGGGC GCCC 

cacctctccgataagccgatactgacccgIgttgtctgtIagccgacgagactacggcggcaca^gccgacag^ wcc 



GGTTCTTTTTG 



G G E A I R L 



LGTTONRLL 



CRRVPAVSAG 



A P G S F C 



TCAAGACCGACCTGTCCGG 



TGCCCTGAATGAACTGCAAGACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGC 



AGTTCTGGCTGGACAGGCCACGGGACTTACTTGACGTTC TGCTCCG 

Q QR PVR CPE . TAR RGSAA IV AGHOGRSLRSCAR 



TCGCGCCGATAGCACCGACCGGTGCTGCCCGCAAGGAACGCGTCGACACGAGCT ^ 



CGTTGTCACTGAAGCGGGAAGGGAC 



TGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCTCCTGTCATCTCACCTTGC 



TCCTGCCGAGAAAGTATCCATC 



GCAACAGTGACTTCGCCCTTCCCTGACCGACGATAACCCGCTTCACGbCCCCGTCCTAGAGGACAGTAGMTGGAACGAGGACGGCTCTTrCATAGGT^ *™ 
R ^ H - SGKGLA A I GRSAGAGS PVISPCSCRES IH 



ATGGCTGATGCAATG CGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGACCACCAAGCGAAACATCGCATCGAGCG AGCACG 
IACCGACTACGTTACGCCGCCGACGTATGCGAACTAGGCCGATGGACGGG TAAGCTGGTGGTTCGCTTTGTAGCGTAGCTCGCTCGTGC 
CNAAAAYA 



TACTCGGATGG 



H G 



ATGAGCCTACC 

S 6 Y L P , 1 R p " S ETSHRAS IYSOG 



AAGCCGGTCTTGTCGATCAGGATGATCTGGACGAAG AGCATCAGGGGCTCGCGCCAGCCGAACTGTTCGCCAGGCTCAAG GCGAGCATGCCCGA^r.-;, 

TTCGGCCAGAACAGCIAGTCCTACTAGACCTGCTTCTCGTAGrCCCCGAGCGCGGTCGGCTTGACAAGCGGTCCGAGTTCCGCTCGTACGGGCTGC r G''T ^ 
° R S , C R S G • SGRRASGARASRT 



VRQAQGEHARRR 



GGATCTCGTCGTGACCCATGGCGArGCCTGCTTG CCGAATATCATGGTGGAAAAT G GCCGCTTTTCTGGATTCATCGAC TGTGGCCGGCT G G GT r.T,,r, 
CCTAGAGCAGCACTGGGTACCGCTACGGACGAACG6CTTATAGTACCACC TTTTACCGGCGAAAAGACCTAAGTAGCTGACACCGGCCGACCCACACCG" 
G S R R Q PVRCLLAEYHGGKWP 



LfUIHRLWPAGCG 



GACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGC 



fTTACGGTATCGCCGCTC 



3AAATGCCATAGCGGCGAS 



CTGGCGATAGTCCTGTATCGCAACCGATGGGCACTATAACGACTTCTCOAACCGCCGCTTACCCGACTGGCGAAGGAGCACG, 

gpl sghs vgyp. YC . RAVRRMG PLPRALRYRR .3 

£5fjt|^ ^^^^^^^^^-^^GTATCGCCTTCTTG ACGAGTTCTTCTGAGCGGGACTCTGGGGTTCGAAA TGACCGACCAAG CGACfirrrAAr rTnrr/. 
<^ TAAGCGTCGCGTAGCGGAAGATAGCGGAAGAACTGC TCAAGAAGACTCGCCCTGA6ACCCC AAGCTTTACTGGCTGGTTCGC TGCGGGTTGGACGdT ^ 



fi F A 



AHRLL SPS 



pvulsgtlgfem 



TDOATPNLP 
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# 
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" ^** a< ~1 AGA ^7 T ^^**-C A ^C(;CCGCCTTC TATGA AAGGTTGGGCTTCGGAATCGTTTTCCGGGACGCCGGCTGGATG ATCCTCCAGCfirfi<;raATrTf->. 

AGrGCTCTAAAGCTAAGGTGGCGGCGGAAGATACTTTCCAACCCGAAGCCrrAGCAAAAGGCCCTGC GGCCGACCTACT^GAGGTCGCGCCCCT^AGt ^ 
S " ° F ° S T A A F Y E » > G F « ' V F R 0 A G W M , L 0 R G d\ 

' ' ' i ... i , | 

rGCTGGAGTTCTTCGCCCACCCTAGGGGGAGGCrA ACTGAAACACGGAAGGAGACAATACCGGAAGGAACCCGCGCTA TGACGGCAArA^,^,..^. 

ACG ACCTCAAGAAGCGGGTGGGATCCCCCTCCGATTGACTTTGTGCCTTCCTCTGTTATGGCCTTCCT TGGGCGCGATACTGCCGTTATTTTTCTGTCTT ™* 
" L - E F F A P " 6 " L T E T » - ^ ' ^ ^ T R A H T A , K R Q M 

TAAAACGCACGGTGTTGGGTCGTTTGTTCATAAAC GCGGGGTTCGGTCCCAGGGCTGGCACTCTGTCGATACCC CACCGAGACCCrAT T ... r . r „. T ^ 

ATTTTGCGTGCCACAACCCAGCAAACAAGTATTTGCGCCCCAAGCCAGGGTCCCGACCGlGAGACAGCTirGGGGTGGclcTGGGGTAACCCCGG^TATG ^ 
" T G V ° S F V " * R G V » * Q G V H S V Q T P P „ „ H V G Q V , 

GCCCGCGTTTCTTCCTrTTCCCCACCCCACCCCCCAAG TTCGGGTGAAGGCCCAGGGCTCGCAGCCAACGTCGGGGC GGCAGGCCCTGrrATArtrrTi-Ar: 

CGGGCGCAAAGAafinAAAArnCCTCfrrrTrrrrrrTT/.i!, 1 ' ' 1 '■■ ■ I i I ■ - 

" "-"««.".«. I I Lt(i6G f-CCGAGCGTCGGTTGCAGCCCCGCCGTCCGGGACGGTATCGG^GTC D,U "- 

ARV SSFSPPHPPS SGEGPGL AAWVGAAGPA I AS 

GTTACrCATATATACTTTAGATTGATTrAAAACrTCATT TTTAATTTAAAAGGArCTAGGTGAAGArCCrTTTTGATA ATCrCATGACCAAAArrr.-TX, 
CAATGAGTATATATGAAATCTAACTAAATTTTGAAGTAAAAA ^AAATTTTCCTAGATCC ACrTCTAGGAAAAACTATTAGAGTACTGGTTTTAGGSAAT 
^ *" ^ Dt- KLHF . FKft | . V< | LFDNLMTlC I p 

ACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTA GAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCT GCGCGTAATCTGCTGr TTfirA'Ai-A 

TGCACTCAAAAGCAAGGTGACTCGCAGTCTGGGGCATCrlTTCTAGTTT^TAGAAGAACTCTAGGAAAAAAAGACGCGCATTAGACGACGAACGTTTG; M 
A S ° P V E K 1 S S . 0 P F F L „ V , c c L Q , 



A CCCCGCTACCAGCGGTGGTTTGTTTGCCGGATCAA G AGCrACCAACTCTTrrTCCGAAGGTAACrGGCTTCAGCAGAGC GCAGATACCAAATA 
TT.TrTG rGGCGATGGTCGCCACCAAACAAAC G GCCTAGTTCTC G ATGGTTGAGAAAAA GGCTTCCAT;GACCGAAGTCGTCTCGCGTCTATGGTT,A: ^ 
i *^^^LPQ QELPTLFPlCVTGF5RAQ|Pn 

CTGTCCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGA A CTCTGTAGCACCGCCrACATACCTCGCrCTGCTAATCCTG TTACCAGTGGcrG,-,, 

GA. GGAAGATCACArCGGCATCAArcCGGTGGTGAAGT;CTTGAGACATCGTGGCGGAT G TA T GGAGCGAGACGATTAGGACAATGGTCACCGAC3„3 r «« 
■ ' P • L 6 H " ' K " S V * " P T V L A L L I L L P V A A \ 

CAGTGGCGATAAGTCGrGTCTTACCGGGrTGGACrCAAG ACGATAGTTACCGGArAAGGCGCAGCGGrCGGGC T C AArr.,n.^ TT ,. T . 

CTCACCGCTATTCAGCACAGAATGGCCCAACCTGAGTTclGCrATCAArGGCCTATTCCGCGTCGCC AGCCCGACTTGCCCCCCAAGCACGTGTGTCGGl «« 
i i ,^ ^ G L 0 S R R . LPDKAQRSg . TGGSC TCP 

r .a cc cg rr ctggatgtggcttgactctatggat^cgcactcg^actctttcgcggtgcgaagggcttccct^ttccgcctgtccata^.- 37iX 

■ " T T ^ T E L . R ' L 0 R t L . E S A T L P E G R K A D R y p" 

TT jC GT CC G^ r G T CCTCrCGC G TGC T CCC T C G AA G GrcCCCC; TT GCGGACCA T AGAAATA T C AGGACAGCCCAAAGCG GT6 GA G AC;GAAC; 

" 6 T G E " T » E L P G ^ " A V V L y S p v G F R H L u E 



R R F l r <»^ A ^ "* 

■ • . S ■ G G R SlvknaSWAAFLRfi a r #- 

' 1 i , . M r lki- LAFCWPF c 



: <WO 982481 0A2_I_> 



m 



WO 98/24810 10/270 PCT/EP97/06956 



. Tuesday. 18 November 1997 1 1 :46 
fig 31 pEGFPsma (1 >6960) Site and Sequence Pa 9 e * 

■ CACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCAT GCAT 

GTGTACAAGAAAGGACGCAATAGGGGACTAAGACACCTATTGGCATAATGGCGGTACGTA 6960 
HMFFPALSPOSVONR [ TAMH 
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a rr a * t t * /-/-.». - _ . ' . " " " 



2 
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TAGTTAT TAATAGrAArCAArTACGGGGTCATTAGTTCATAGCCCATATA TGGAGTTC CGCGTTA CATAAC TJ ACGQT AAATGCCCCr.r r T r.nr Tn *rr '~ 

A T.** A ATA ATTATr-ATTA^TT*. + T- ~ ~ ~ _ _ ! ' _ 1 1 1 I i ... I ■ - - 



1 ' 1 1 | ■ j , . , ~ ■ 1 I UuLLLuLL I uUC TGAC'"" 

ATCAATAATTATCATTAGTTAATGCCCCAGTAATCAAGTATCGGGTATATACCTCAAGGCGCAATGTATlGAATGCCATTTACCGGGCGGACCGACrr^ 
^ V1NYGV I 5 S . P I YGVP R Y I TYGKVPav L ^ 

CCC AACG ACCCCCGCCCATTGACGTCAATAATGA CGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGT CAATKGCTf;f;Af;TA ttta/-/-^- 

GGGTTGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCATTGCGGTTAlcCCTGAAAGGrAACTGCAGrTACCCACCTCATAAATGCC- * 
' " R P " " ' ° V N N ° V C S H S N A N R Q F p L T s M 6 G v p T ^ 

AAACTGCCCACTTGGCAGTACATCAAGTGTATCA TATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGG CCCGCCTGGrATTATnrrr aktj 

tttgacgggtgaaccgtca tgtagttcacatagtatacggttcatgcgggggataactgcagttac tgccatt taccgggcggaccgtaatacgggtcat 20 

^ '^ P '*" ^S TS SVS Y A K Y ' A P Y . ^ Q ■ R . M A R |_ A L C P V 
CATGACCTTATr,r,r,ArTTTrrTArTTrrr*/.T.^ . ^ 

■ , T — • ■^ r .»v.M,v. ,^u.« l lA^^A.CGCTATTACCATGGTGATGCGGTTT TGGCAGTACATrAATr.r^r^, 

gtactggaataccctgaaaggatgaaccgtcatgtagatgcataatcagIagcgataatggtaccactacgccaaaaccgtcatgtagtmcccgcacct ^ 
hdlmglsylavh lr *SHRYYHGOAVLAVHQVAW 

TAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCAC CCCATTGACGTCAArGGGAGTTTGTTTTGGCACCAAAAT CAACGGGACTTTCCAAAArn T r. T ... 

ATCGCCAAACTGAGTGCCCCTAAAGGTTCAGAGGTGGGGrAACTGCAGTlACCCTCAAACAAAACCGTGGTTTTAGTTGCCCTGAAAGGlTTTACAGCA; 
1 *■ " ■ L T ° ' S K $ " 9 " ■ * 0 * E F ,V L A P K S T G L S K M S 

ACAACTCCGCCCCATTGAC G CAAArGGGCGGTAGG CGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAG TGAACCGTCA G ATrr,r TAf : r .,- T , 

rGTTGAGGCGGGGTAACTGCGTTTACCCGCCATCCGCACATGCCACCCTCCAGArATATlcGTCTCGACCAAArCACT TGGCAGTCTAGGCGATCGCGAT 
Q LR P I0ANGR a c t v g g l y k Q s WFSEPSOPLAL 



CCGGTCGCCACC ATGGTGAGCAAGGGCGAGGAGC TG TTC ACCGGGGTGGTGCCCA TCCTGGTCGAGCTGGACG GCGACGTAAACGGCCACAAftTTr Ar:r-: 
GGCCAGCGGTGG TACCACTCGTTCCCGCTCCrcGACA AGTGGCCCCACCACGGGTAGGACCAGCTCGACCTGCCG CTGCATTTGCCGGTGrTCAAGTCG: 



500 



eGFPC.e.unc53ecl 



? V " V S K G E E L F T G V ^ ' L V E L D G 0 V , N 6 H K P S 

TGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAG CTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCC CGTGCCCTGGCCCArrrrrr.r.,- 
ACAGGCCGCTCCCGCTCCCCCMCGGTGGArGCCGrTCG^C TGGGACrTCAAGTAGACGlG GrGGCCGTlcGACGGGCACGGGACCGGGUnA.rA.T-": 



-eGFPC.e.unc53ecl 

V S ° E G E G - ° A T Y G * L T L * ' ' c T r C K , P y p y p T L v T 

CACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCC CGACCACATGAAGCAGCACGACTTCTTCAAGrCCGCCAT GCCCGAAGGCTACGTCCAGGA., 
O.GGGACrGGATGCCGCACGrCACGAAGTCGGCGATGGG GCTGGTGrAclrCGTCGTG clGAAGAAGTTCAGGCGGTACGGGCTTCCGAlGCA.GTrrT - 5 - 



-eGFPC.e.unc53ecl 



■ " T . " G V ° C F 5 » Y P 0 H H K Q H D P F K s A „ „ £ G y y Q £ 

CGC ACCATCTTCTTCAAGGACGACGGCAACTACAAGA CCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGA ACCGCATCGAGrTr.A^f;-;r.r />Tf: 
Gi.oTGGTAGAAGAAGTTCCTGCTGCCGTTGATGTTCTGGGC GCGGCTCCACTTCAAGCTCCCGc" TGTGGGACCACTTGGCGrAGC TCGACTTCCCGTAGC 



©GFPC.e.unc53ecl 
T ' , F F K 0 P GNYKTRAEVKF 



G 0 T 1 V « R i E L K G I 



BNSDOCID: <WO 982481 0A2_I_> 
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Pagel- 



ACTTCAAGGAGGACGGCAACATCCrGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATAT CATGGCCGACAAGCAGAAGAACGGCATCAu 
TGAAGTTCCTCCTGCCGTTGTAGGACCCCGTGTTCGACC TC ATGTTGATGTTGTCGGT6TTGCAGA TATAG TACCGGCTG TTCGTCTTCTTGCCGTAG T~ 



-eGFPC.e.unc53ecl 



OFKEDGNILGHKLEYNYNSHN 



VY IMAOKQKMG 



I J 



GGTGAACTTCAAGATCCGCCACAAC ATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGG CGACGGCCCCGT 
CCACTTGAAGTTCTAGGCGGTGTTGTAGCTCCTGCCGTCGCACGTCGAGCGGC TGGTGATGGTCGTCTTGTGGGGGTAGCCGCTGCCGGGGCACGArr,^r 



~ — e G FPC. e. "^^-~| - 

V N F . K 1 R . H N ' E D G S V Q L A 0 HYQQNTPIGDGPVLL 



CCCGACAACCACTACCTGAGCACCCAGTCCGCCC TGAGCAAAGACCCCAACGAGAAGCGCGATCAC ATGGTCC TGCTGGAGTTCGTGAC CGCCGCCGGGA 

gggctgttggtgatggactcgtgggtcaggcgggactcgIttctggggtIgctcttcgcgctagIgtaccaggacgaccIcaagcactggcggcgg ,30c 



-eGFPC.e.unc53ecl 



PONHYLSTQS 



a l s k o pnekrohmvllefvtaag 



tcactctcggcatggacgagctgtacaagtccggactcagatctacgtcaaatgtagaattgataccaatctacacggattgggccaa ^^ 
agtgagagccgtacctgctcgacatgttcaggcctgagtctagatgcagtttacatcttaactatggttagatgtgcctaacccg^ 




[ T - L G M ° E . L Y K , S G «- * S T S N V E L IP I YTDVANRHL 



CG^h 



gaagggcagctta tcaaag tcgattagggatatttccaa tgattttcgcgactatcgactggtttctcagcttattaatg tgatcgttccgatcaa 
c ttcccgtcgaatagtttc a ^^^ aa ^ccctataaaggttactaaaagcgctgatagctgac caaagagtcgaataattacactagcaaggctag W^rT 

-eGFPC.e.unc53ecl 




" C.e.unc53ed — — _ 

K G 5 . L S K S 1 p P t S N DFRDYRLVSQLINVIV 



I M E 



TTC GC C TGC A T TC ACGAAACGTTTGG CAAAAATC AC A TCGAACC TGGATGGCCTCGAAACGTGTCTCGACTACCTGAAAAATC TGGGTC TCGAC XQC 
AAGAGCGGACGTaAGTGCTTTGCAAACCGTTTTTAGTGTAGC TTGGACCTACCGGAGCTTTGCACAGAGCTGATGGACTTTrTAGACCCAGAGC TGA'i'll 




F SPAFTKRL 



A , K 1 T S " LOGLETCLDYLKNLGLOC 



CGAAACTCACCAAA ACCGATATCGACAGC^ 

GCTTrGAGTGGTTTTGGCTATAGCTGTCGCCTTTGAACCCACGTCAAGAGGKGACGAGAAGGACGAGAGGTGGATGTTCGTCTTCGAAGCCGT^ 



•C.e.unc53 ect 



3 * LT XroiOSGNLG 



A , V «■ Q L L F L L S T Y K 0 K L 3 Q L » 



BNSDOCID: <WO 982481 0A2_I_> 



m 

WO 98/24810 



13/270 



# 

PCT/EP97/06956 
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AAAAGATCAGAAGAAArTGGAGCAACTACCCAC ATCCATrATGCCACCCGCGGTTTCTAAArTACCCTCGCCACGTGTCG CCACGTCAGCAArrnr TTi" . 
7TrTCTAGTCTTCTTTAACCTCGTTGATGGGTGTAGGTAATACGGTGGGCGCCAAAGAT;TAATGGGAGCGGTGCACAGCG G r G CAr. T r.-^^^ 




GCAACTAACCCAAATTCCAACTTTCCACAAATGTCAA CATCCAGGCTTCAGACTCCACAG TCAAGAATATC GAAAA TTGATTCATCAAAGATTnr.TATr.-. 
COTTGATTGGGTTTAAGGTTGAAAGGTGTTTACAGT TGTAGGTCCGAAGlcTGAGGTGTCAGTTCTr AT^rTrTT^,:^^.^. rnT ..I 




A T N P H S N F P Q MSTSRLQTPOSRI S 



AGCCAAAGACGTCTGGACTTAAACCACCCTCATCATCAA CCACTTCATCAAATAATACAAATTCATTCCGTCCGT CGAGCCGTTCGAGTGGCAATaaTaj 
TCGGTTTCTGCAGACCrGAATTTGGTGGGAGTAGTAGTTGGTGAAGTAGiTTATTATGTlrAAGTAAGGCAGGCAGCTCGGCAAGrT^ 



3TTATTAT7 



20CC 



-eGFPC.o.unc53ed 



kPKTSGLKPPS 



-C.e.unc53 ed 



SSTTSSNNTNSF 



RP5SRSSGNNN 



TGr T GGCTCGACGATATCCACArcrGC G AAGA G CTTAGA ATCArCATCAACGTACAGCTCTATTTCGAATCTAAAC CGACCrACC TC CCAA C T.rAAA., 
MCAACCGAGCTGCTATAGGTGTAGACGCTTCTCGAATCTTAG TAGTAGTTGCATGTCGAGAT AAAGCTTAGATTTnnr Tf;i;AT^r:AfTr;nT I 



-eGFPC.g.uncS3ecl 



*' GST I S T S 



•Co.unc53 eel 



AKSLESSSTYSS 




? S R p 0 t Q 



-C.o.unc53 eel 



L , V R . v A T T T K I G S S K L A A P 



KAVSTPKL 



!1T™ A * GA !I A " GG : GCAAA ^^^ 

TACGACTTTAATTTCAATAAGTCATCGTTTTTGGGTAn 



GAAGACACTTCTGATAACCrCGTT"rTGTTCTCGGGCTAT rGTCGfTArr ArrArrArrArrr 



-eGFPC.e.unc53ecl 



A 3 V K T I g 



-C.e.unc53 ed 



A K Q , E P D N S G GGGGGMLKLKLF 



S S 



N P 
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7-CCTCATCGAATAGCCCACAACCTACGAGAAAGGCGGCGGCGGTGCCTCAACAACAAACTTTGTCGAAAATCGCTGCCCCAGTG AAAAGTGGCCTGAAo 
**3GAGTACCTTATCGGGTGTTCGATCCTCTTTCC6C^ 



— C.e.unc53 eci 



5 SS N5P QPTRKAA AV PQQQTLSK IAAP V K 



S G L K 



CCGCCGACCAGTAAGCTGGGAAGTGCCACGTCTATGTCGAAGCTTTGTACGCCAAAAGTTTCCTACCGTAAAACGGACGC 

ggcggctggtcattcgacccttcacggtgcagatacagcttcgaaacatgcggttttcaaaggatggcattttgcctgcggggttagtaIa 



■eGFPC.e.unc53ect 



PPTSKLGSATSMSK 



-C.e.unc53 eci 



L , c r p K V. S Y RKTOAPIisqq 



actcgaaacgatg ctcaaagagcagtgaagaagagtccggatacgctggattcaacagcacgtcgccaacgk 

^ctttgctacgagtttctcgtcac^^ 26tX 



-eGFPC.e.unc53ed 



* C.e.unc53 ed 

OSK R CSKSSEEESGYAGFN 



3T SPTSSSTEGSLSf^ 



gcattccacatctt ccaagagttcaacgtcagacgaaaagtctccgtcatcagacgatcttactcttaacg^ 

cgtaaggtgtagaaggttctcaagttgcagtctgcttttcagaggcagtagtctgctagaatgagaattgcggaggtagcactgtcgatagtc 27 ° ,: 



-eGFPC.e.unc53ed 



H STSSKSSTSDEK 



-C.e.unc53 ed 



SPSSOOLTLNASIVT 



A I R 0 P 



ATAGCCGCAACACCGGTTrCTCCAAATAT TATCAACAAGCCTGTTGAGG^AAACCAACACTGGCAGTGAAAGGAGTGAAAAGCACAGCGAAA AAA^TC 
TA-CGGCGrTGTGGCCAAAGAGGTTTATAATAGTTGTTCGGACAACTCCTTrTTGGTTGrGACCGTCAclTTCCTCACTlTTCGTGTCnrTTTTTT.-rJ: 



-eGFPC.e.unc53ecl 



-C.e.unc53 ed 



A A T P V 



3PNIINKPVEEKPT 



LAVKGVKSTAKKD 



CACCTCCAGCTGT TCCGCCACGTGACACCCAGCCAACAArCGGAGTTGTTAGTCCAATTATGGCACATAAG AAGTTGACAAA 
GTGGAGGTCGACAAGGCGG TGCACTGTGGGTCGGTTGTTAGCCTCAAC AAfrAnnTTiUArArrnTr TAT-rr 



TGACCCCGTGATATC TG»V 



TTCAACTGTTTACTGGGGCACTATAG 



-C.e.unc53 ed 



* P A V P p ROTQPT!GVVSPiM 



AHKKLTNDPV [ S E 
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K E . p E K L Q 5 " s ■ ° T - ° * > » L P > L * s , » , . : 



1 — 1 ■ ■ «- ^ n f o 

A TCCGACAACCACC AACGTACGATGTTC TTCTAAAAC A AGGAAAAATCar LTrcrrrr. rr , . Wf . _ n y , , „ 



3\C< 



C.e.unc53 eci 



JHOPPTYOv|.t.KQGK I TSPVKSFCyE"ossasCo 

;»,T 5 , escr c <TC c erc6CCTC>egIG cesccc CTrcr5CT><icy|trc|ici|i|; • 



w.v.UHWtM DU ■ ~ 

5 I V A H A S a OVTPPTKTSGN ' ~ 




CA3 C TrAT AATGGTTACAAATAAAGCAATAGCATCACAAAT TTr A r AA . T< ,^> TTT „ Trr . rrr . 

GTCGAATATTACCAATGITTATTTr ttat^^-t '-— * 1 ' TG " ATTC T * GT 'GTGGrT TGTCCAAACITAT.- ^ 

•S L . v ^ q ^ ^ g ^^^^^^^^^^ ^ G ^ AAaAAA ^^^^^^^^^ ^"CACCAAACAGGTTTGAGTAGT7 
UL ^-" » I F P T > r . LVFVOTHQ 



BNSDOCID: <WO 982481 0A2J_> 
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Paget 



TGTAT 



ftTCTTAACGCGTAAATTGTAAG CGTTAATATT TTGT T AAAATTCGCGTTAAATTTTT GTTAAArCAGCTrATTTTTTA a/- p . , 

■ ■ • A L ' F C ■ " S » • ' F " " » » h r L T ^ PJ^T 

C^CCTTArAAArCAAAAGAATAGACCGA G ArAGGGTTGAGrGrTGTrCCA G Tr TGG AACAA G Ar. Trf -^ T , T p 




AAAGGGCGAAAAACCGTCTATCA6GGCGATGGCCC 




— ■ ^L tL DGE SR RTVRE RlCGR K ft 

GGCAAGTGTAGCGGTCACGCTGCGCGTAACCACC 



K E R A L G R 



ACACCCGCCGCGCTTAArGCGCCGCTACAGGGCGrr,rr A ^ T ^,^ TTTrrrrnnr|TrTLLL _ 



AAAGCCCCTTTACACGCG 



— ■ ■ f 5 ' ' ' ° ' ' ' » » » H . , , „ c , „ . , , „ . . 



ASQLVSNHSP 



A p 



IS2S ^" w .rri^r m -" - . ^ 

I , 1 LFv RPRLL 



QRSIKRQDED 



R F A 



TG^ACAAGATGGA TTGCACGCAGGTTCTCCCfirrnrT 

— ) . 4— — , . . , 7 ,v -^^'^ l ^^ibuciLACAACAGACAATrf;f;rTr:rTrTq^ Tf - rrrir ^ 

AC TTGTTCTACCTAACGTGCGTCCAAGAGGCCG GCGAACCCACCTC TCCGATAAGCCGA ' GCTCTGATGCCGtw 



ITTGGGTGGAGAGGCTATTCGGCrATGACTGGGCACA ACAGAC^ 

l H k h p c tqvlrpl gvr — gatac tgacccgtg ttgtc tgttagccg acgagac tacggcgg w7C,; 



gysamtgh 



NPOSAALMPP 



LSRPTCPVP MNTk-To^ 
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fig 32 pEGFPecl (1 > 6700) Site and Sequence 

GGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTGAAGCGGG AAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCT 
CCGACCGGTGCTGCCCGCAAGGAACGCGTCGACACGAGC TGCAACAGTGACTTCGCCCTTCCCTGACCGACGA TAACCCGCTTCACGGCCCCGTCCTAGA 
G V P R W A F L A Q L C S T L S L K R EGTGCY'JAKCRGR I 

CCTGTCATCTCACCTTGCTCCTGCCGAGAAAGTATCCATCATGGCTGATGCAATGCGG CGGCrGCATACGCTTGATCCGGCTACCrGCCCATTCGACCAC 
GGACAGTAGAGTGGAACGAGGACGGCTCTTTCATAGGTAGTACC6ACTACGTTACGCCGCCGACGTATGCGAACTAGGCCGATGGACGGGTAAGCTGGT3 ^ 
SC HLT LL LPR KYPSV LM QCG GC1RL I R |_ P A H S T T 

CAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGG ATGATCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCG 
GTTCGCTTTGTAGCGTAGCTCGCTCGTGCATGAGCCTACCTTCGGCCAGAACAGCTAGTCCTACTAGACCTGCTTCTCGTAGTCCCCGAGCGCGGTCG6C 

KRN * ASSEHVLGWKPVLS ! R M IVTKS IRGSRQP 

— — * ■ ■ ■ t ■ ■ ... 1 ... - - 1 . . . . . , , . , , , 

A AC TGTTCGCCAGGCTCAAGGCGAGCATGCCCGACGGCGAGGATCTCGTCGT GACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCG 

' ' ' '" 1 * * ' ' * ' t ' ' ' ■ t « ■ m ■ 1 1 j ■ ■ . 1 I 1 . . !■ >■« ■ ■ 1 1 I 1 ■ t | I I C - "' •*•.* 

TTGACAAGCGGTCCGAGTTCCGrTrRTArn^arTRrrnrTrrTA/irtcrA^r ArTr.nr.r Arrnrr Arnn i^nrr a * ^i-^,- T -r . ^ . .1 

~. . „« w . , «w«w«>-«i-m-.^ W v»V. 1 IHIHU I I I I I ALLbUv. 

N C S P G S R R A C P T ARISS.PMAMPACRISVVKMA 

' 1 * ■ 1 ■ 1 ii i . . . . 1 . , . , 1 , , . . , . , [ 

CTTTTCTGGATTCATCGACTGTGGCCGGCTGGGTGTGGCGGACCGCTATCAGGACATAGCGTTG GCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAA 
GAAAAGACCTAAGTAGCTGACACCGGCCGACCCACACCGCCTGGCGATAGTCCTGTATCGCAACCGATGGGCACTATAACGACTTCTC6AACCGCCGCTT 5X '* 
A F . L P S STVAGWVWRTAIRT.RWLPV ILLKSLAAN 

' ' ' 1 I — ■ - I ■ — , I , ,, , ! , , , . 1 

TGGGCTGACCGCTTCCTCG TGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGACG AGTTCTTCTGAGCGGGACTCT 
ACCCGACTGGCGAAGGAGCACGAAATGCCATAGCGGCGAGGGCTAAGCGTCGCGTAGCGGAAGATAGCGGAAGAACTGCTCAAGAAGACTCGCCCTGAGA 
G L T A S S C F T V S P L P 1 P S A S P S IAFLTSSSEROS 

GGGGTTCGAAATGACCGACCAAGCGACGCCCAACCTGCCATCACGAGATTTCGATTCCACCGCCGCCTTCTATGAAAG GTTGGGCTTCGGAATCG 
CCCCAAGCTTTACTGGCTGGTTCGCTGCGGGTTGGACGGTAGTGCTCTAAAGCTAAGGTGGCGGCGGAAGATACTTTCCAACCCGAAGCCTTAGCAAAAG 
GVR ND RPS DA QPA1TRF RFHRRLL KVGLRNRF 

CGGGACGCCGGCTGGATGATCCTCCAGCGCGGGGATCTCATGCTGGAGTTCTTCGCCCACCCTAGGGGGAGGCTAACTGA AACACGGAAGGAGACAATAC 
GCCCTGC GGCCGACCTACTAGGAGGTCGCGCCCCTAGAGTACGACC TCAAGAAGCGGGTGGGATCCCCCTCCGATTGACTTTGTGCCTTCCTCTGTTATG 
P G , R R L 0 D , p P * R G S H A G V L R P P . G £ A M N T E G D M T 

CGGAAGG AA ^CCGCGCTATGACGGCAATAAAAAGACAGAATAAAACGCACGGTGTTGGGTCGTTTGTTCAT AAACGCGGGGTTCGGTCCCAGGGCTGGCA 
GCCTTCCTTGGGCGCGATACTGCCGTTATTTTTCTGTCTTATTTTGCGTGCCACAACCCAGCAAACAAGTATTTGCGCCCCAAGCCAGGGTCCCGACCGT 
G R N P R y D G N K K T E . N A R C V V VCS . TRGSVPGLA 

CTCTGTCGATACCCCACCGAGACCCCATTGGGGCCAATACGCCCGCGTTTCTTCCTTTTCCCCACCCCACCCCCCAAGTT 

GAGACAGCTATGGGGTGGCTCTGGGGTAACCCCGGTTATGCGGGCGCAAAGAAGGAAAAGGGGTGGGGTGGGGGGTTCAAGCCCACTTCCGGGTCCCGAG ^ 
LCR , YPT ETPLGPI RPRFFLFPTPPPKFG.RPRA 

GCAGCCAACGTCGGGGCGGCAGGCCCTGCCATAGCCTCAGGTTACTCATATATACTTTAGATTGATTTAAAAC T TCATTTTTAATTTAAAAGGATCTAGG 

CGTCGGTTGCAGCCCCGCCGTCCGGGACGGTATCGGAGTCCAATGAGTATATATGAAATCTAACTAAATTTTGAAGTAAAAATTAAATTTTCCTAGATC: ^ 
R S QRR G G RPC HSL RL L 1 Y T L 0 . F K T 5 F L I JC 0 L G 

TGAAGATCCTTTTTGATAA TTC TCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACfGAGCGTCAGACCCC GTAGAAAAGATCAAAGGATC TTCTTG 
AC TTC ^ A ^GAAAAACT ATTAGAGTACTGGTTTTAGGGAATTGCACTCAAAAGCAA6GTGACTCGCAGTC TGGGGCATCTTTTC TAGTTTCC TAGAAGAAC ^ 
E ° P . f ' ■ S H. 0 QNPLT.VFVPLSVRPrrkOQRIFL 
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AGATCCTTTTTTTC TGCGCGTAATC TGC TGCTTGC AAACAAAAAAACCACCGC TACC AGCGGTGGTTTGTT TGCCGGA TCAAGAGCTACCAACTCT TTTT 
TC TAGGAAA AAA AG AC GCGC ATT AG ACGACGAACGTT TGTTTTTTTGG TGGCGATGGTCGCCACCAAACAAACGGCCTAGTTC TCGATGGTTGAGAAAAA ^ 
R S F F S A R N L L L A H K K T T A T S GGLFAGSRATNSF 



CCGAAGGTAACTGGCTTCAGCAGAGCGC AGATACCAAATACTGTCCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGC 



ACCGCCTA 



GGCTTCC ATTGACCGAAGTCGTCTCGCGTCTATGGTTTATGACAGGAAGATCACATCGGCATCAATCCGGTGGTGAAGTTCTTGAGACATCGTGGCGGAT ^ 
5E GNVL Q QS A DTK YCPS 5VA VVR PPLQELCSt/y 

CATACCTCGCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTC 

gtatggagcgagacgattaggacaatggtcaccgacgacggtcaccgctattcagcacagaa tggcccaacctgagttctgctatcaatggcctattccg 53CC 

t P R S A N P V T S G C C Q V R , y y s y R V G L K T \ V T G . G 
GCAGCGGTCGGGCTGAACGGGGGGTTCG^ 

cgtcgccagcccgacttgccccccaagcacgtgtgtcgggtcgaacctcgcttgctggatgtggcttgactctatggatgtcgcactcga 51iCC 

A A V G L N G G F V H T A Q L G A N 0 L H R T- £ I P T A . A M R K 

gccacgcttcccgaagggagaaaggcggacaggtatccggtaagcggcagggtcggaacaggagagcgcacgagggagcttccagggggaaa cgc^ 
cggtgcgaagggcttccctctttccgcctgtccataggccattcgccgtcccagccttgIcctctcgcgtgctccctcg^ 35CC 

» H A S R R E K G G Q V S G K R Q G R N R R A H E G A S R G K R L V 

atctttatagtcctgtcgggtttcgccacctctgacttgagcgtcgatttttgtgatgctcgtcaggggggcggagcct^ 

tagaaatatcaggacagcccaaagcggtggagactgaactcgcagctaaaaacactacgagcagtccccccgccIcggaIacctItttg^ 36C< 



S > •. S C R V S p P L T -ASIFVMLVRGAEPME 



K R Q Q P 



tgattctgtggataaccgtattaccgccatgcat 



ggcctttttacggttcctggccttttgctggccttttgctcacatg ttctttcctgcgttatcccc 
ccggaaaaatgccaaggaccggaaaacgaccggaaaacgagtgtacaagaaaggacgcaataggggactaagacacctattggca^^ 37iC 

G L F T V P G L L I, A F C S HVLSCVIP.FCG.PYYRHA 
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Page I 



TAGTTATTAATAGTAATCAATTACGGGGTCATTAG TTCATAGCCCATATATGG AGTTCCGCGTTACATAA 

ATCAArAaTTATCArTAGTTAATGCCCCAGTAATCAAGTATCGGGTATATACCTCAAGGCGCAATGrATTGAAT GCCATTTACCGGGCGGACCGACTGG~ 
' ^ INYGV 1SS ■ P I YGVPR V . T Y G K W P A v L T " 



CCCAACGACCCCCGCCCATTGACGTCAA ^"^^'^^^•^^ '^^'Q'^TCCCATAGTAACGCCAATAGGGACTTTCCArTGAC GTCAATGGGTGfiAflTATTTArr.-r* 

GGGTTGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCATTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTCATAAATGCCA 
^ Q . " P P P ' ° V N N ° V C S " ^ A N P 0 F p L T s . . . v F t v 

AAACTGCCCACTTGGCAGTACATCAAGTGTATCATATG CCAAGTACGCCCCCTATTGACGTCAATGACGGTAAA TGGCCCGCCTGGCATTATGr^AnT, 
TTTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGGGGATAACTGCAGTTACTGCCATTTAC CGGGCGGACCGTAATACGGGTCA T 

CATGACCTTATGGGAC T TTCCTACTTGGCAGTACATCTACGTATTAG TCATCGCTATTACCATGGTr,ATr.r^rT TTf; .^^ ... 

_ _ I I | | | . ■ ■ ■wux.nuiMV.MH.HHimibUllUGA 

gtactggaataccctgaaaggatgaaccgtcatgtagatgcataatcagIagcgataatggtaccactacgccaaaaccgtcatgtagtIacccgcacct 
hd lmglsylavhlr i s H B Y Y_ H . D A V L A V H 0 V A V 



tagcggtttgactc acggggatttccaagtctccacccc attgacg tcaatgggagtttgtt ttggcaccaaaatcaac gggactttccaaaatgtcgta 

ArCGCCAAACTGAGTGCCCCTAAAGGTTCAGAGGTGGGGTAACTGCAGTTACCCTCAAACAAAACCGTGGTTTTA GTTGCCCTGAAAGGTTTTACAGCAr 
' ' • L T G 1 5 " S P P " • » ° " 6 F V L A P K S T G , S K H S ■ 

.CAACrcCGCCCCATrGACGCAAATGGGCGGTAGGCGTG rACGGrGGGAGG T CTATATAAGCAGAGCTGGTrTAGT G AACC GrCAGATCCGCTA GCB r Ta 

tgtgaggcggggtaactgcgtttacccgccatccgcacatgccaccckcagatatatIcgtctcgaccaaatcacttggcagtctaggcgatcgcgat 

i . foangp, ■ A { " ^ V **SLYKQSVFSEPSDPLAL 

CCGGrCGCCACCATGGTGAGCAAGGGCGAGGAGCTG TTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGAC GGCGACGTAAACGGCCA C AAGT T r,. r , 
GGCCAGCGGTGG TACCACTCGTTCCCGCTCCTCGACAAG ^QC^CCACCACGGGTAGG ACCAGCTCGACCTGCCGCTGCATTrGCCGGTGrTrAAnr rfi^ 



-CO 



eGFPC.e.unc53xba 



P ' A . T " V S K G E E L F T G - V " ■ ^ V e L 0 G 0 V N G H . F 3 

TGTCCGGCGAGGGCGAG 5 GCGATGCCACCTACGGCAAG C TGACCCTGAAGTrCATCTGCACCACCGGCAAGCTGC CCGTGCCCTGGCCCACC ( : T rr. T ,.- 
ACAGGCCGCTCCCGCTCCCGCTACGGTGGATGCCGTrCGAC TGGGACTTCAAGrAGACGlG GTGGCCGTlcGACGGGCACGGGACCGGGlGGGAGrA.-T; ' 



-eGFPC.e.unc53xba 



VSGEGEGOATYGKL TLKFlCTTGKLPVpypT^yY 
CACCCTGACCTACGCCGTGCAGTGCTTCAGCCGCTAC C CCGACCACATGAAGCAGCACGACTTCTrCAAGrCCGC CAr G CCCGAAGGCTACr. TC rA^ 

g.^gactggatgccgca^cacgaagtcggcgatggggctg gtgtacItcgtcgtgcIgaag aagttcaggcggtacgggcttccgaIgcaggtcct; ssc 



-eGFPC.e.unc53xba 



' L " G V ° C F 5 R Y P 0 H M K 0 H 0 F F K S A H P E 6 Y v Q E 

CGCACCATCTTCTTCAA G GACGACGGCAACTACAAGACC C GCGCCGAGGTGAAGTTC G AGGGCGACACCCTGGTGAAC CGCATCGAGCTGAAr.GGCAT.-, 
GujTovjTAGAAGAAGrTCCrGCTGCCGTTGATGTTCTGGGC GCGGCTCCACTTCAAGCTCCC GCTGTGGGArrArTTfirrf:TA/;r T<-/-Aj-T-r<-y-J-/--r..-i 



- eGFPC.e.unc53xba 
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f< 1 ' 
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fig 33 pEGFPxba (1 > 5447) Site and Sequence 

ACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGC 



Page^ 



TGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCC 



GAC AAGCAGAAGAACGGCATCAA 



TGAAGTTCCTCCTGCCGTTGTAGGACCCCGTGTTCGACCTCATGTTGATGTTG TCGGTGTTGCAGATATAG TACCGGCTGTTrGTrTTrTrr. 



GCCGTAGTT 



" ©GFPC.e.unc53xba — 

DFKEDGN lLGH KLEYNYNSHNVY IM 



A D K Q K N G I 



GGTGAACTTCAAGATCCGCCACAACATCGAGGA CGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCA TCGGCGACnfirrrm T/:rT/;fTf 
CCACTTGAAGTrcrAG G CG G TGrT G rAGCTCCTGCC G TCGCACGTCGAGCGGCTGGTGA;GGTCG TCTrGT GG ^ TA .;,^ T L — nrr .' 



12a 



V H F K 1 R . H " ' E OGSVQLAOHYOONT 



P IGDGPVLL 



CCCGACAACCACTACCT 



:TGA6CACCCAGTCC GCCCTGAGCA^AGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGA GTTCGTGACCGC^ 
GGGCTGTTGGTGATGGACTCGTGGGTCAGGCGGGACTCGTTTCTGGGGTTnrTrTTrf:rr:rT Ari-rr t a/-/- ■! — . _ i ' 1 ' ► 130! 




PDNHYLSTQS 



ALSKOPNEKROHM 



VLLEFVTAAG 



TCACTCTCGGCATGGACGAGCTGTACAAGTCCGGACTCAGATCTACGTCAAATGTAGAATTGATACCAATr 



TACACGGATTGGGCCAATCGGCACC TTTC 



agtgagagccgtacctgctcgacatgttcaggcctgagtctagatgcagIttacatctt^actatggttIgatgtgcctaaccc^t 



TAGCCGTGGAAAG 



- eG FPC. e. unc63xba 



(TLGMDELY 



-C.e.unc53 xba 



K S G L R S T S N V E L r P I Y T D V A N R H L S 

GAAGGGCAGCTTATCAAAG ^^^^*^GGGATATTTCCA ATGATTTTCGCGACTATCGACTGGTTTCTCAGCTTATTA A TG TGATCGTTCCGATCAAPfiAA 
CTTCCCG ^^GAATAGTTTCAGCTAATCCCTATAAAGG T TACTAAAAGCGC tGATAGC T GACC AAAGAGTCGAATA ATTAr at T^nr* Kr.rr-r ^tti-/.t! 




1 5vX 



K G S L S K S ' R D isndfrdyrlv 



ITCGCCTGCATTCACGAAACGTTTGGCAAAAATCACArCGAACC 



aaoagcggacgtaagtgctttgcaaaccgtttttagtgtagcttggaccIaccggagctItgcacaga^t; 



TGGATGGCCTCGAAACGTGTCTCGACTACCTGAAAAATCTGGGTCTCGACrorT 



gatggactttttagacccagagctgacga 




S P A F T K R 



L A K I TSNLDGLE 



t cloylknlglo 




3 K L TKTDIDSGNL 
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«sauvage» 



«vulve» 

— « 

«mutant» . , , . 

1 I r— 1 m 



ph<§notypes sauvage vuive mutant nombre 

soucnes 

wt ALN 
PLN 

unc-53(n152) ALN 
PLN 



unc-S3(n1$2) ALN 
pAbnc-53 PLN 
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pAUnc-53-H1 PLN 
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Enzymes : All 146 enzymes <No Filter) 

Settings: Linear. Certain Sites Only, Standard Genetic Code 

GACGGATCGGGAGATCrCCCGATCCCCTATGGTCGACTC TCAGTACAATC rGCTCTGATGCCGCATAGTTAA GCCAGTATCTGCTCCCTGC TTG TGTGTT 

CTGCCTAGCCCTCTAGAGGGCTAGGGGATACCAGC TGAGAGTCATGTTAGACGAGACTACGGCGTATCAATTCGGTCATAGACGAGGGACGAAC ACACAA ^ 
, T D R , E IS RSPMV OSQYNLL .CR IVKPVSAPCLCV 

GGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAATT GCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCG 

CCTCCAGCGACTCATCACGCGCTCGTTTTAAATTCGATGTTGTTCCGTTCCGAACTGGCTGTTAACGTACTTCTTAGACGAATCCCAATCCGCAAAACGC ^ 
G G R ,• V V R E Q N L S Y N K A R L D R Q L H E E s A . G AFC 

CTGCTTCGCGATGTACGGGCCAGATATA CGCGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCATAGCCCAT^Ta 

1,1 1 1 1 1 1 1 1 1 1 1 1 i . . , , , | . TV 

GACGAAGCGCTACATGCCCGGTCTATATGCGCAACTGTAACTAATAACTGATCAATAATTATCATTAGTTAATGCCCCAGTAATCAAGTATCGGGTATAT 
^ A S R C T G Q I Y A L T i. I I D , L L IVlNYGVISS.Ply 

TGGAG TTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATG ACGTATGTTCCCA TAGT 
ACCTC AAGGCGCAATGTATTGAATGCCATTTACCGGGCGGACCGAC TGGCGGGTTGCTGGGGGCGGGTAAC TGCAGTTATTACTGCATACAAGGGTATCA ^ 
G V P R Y ' * V G K V P A V L T A Q R P P PIDVNNDVCSHS 

AACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGACTATTTACGGTAAACTGCCCAC TTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCC 
TTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTGATAAATGCCATTTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGG ^ 
NAN RDFPL TSMGGLFTVNCPLGSTSSVSYAKYA 



CCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGT^^ 

GGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCATGTACTGGAATACCCTGAAAGGATGAACCGTCATGTAGATGCATAATCAGT 6 °° 
PY . RQ - R , - MA RLALC PVHDLMGLSYLAVHLR I S H 

TCGCTATTACCATGGTGATGCGGTTTTGGCAGTACArCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATT TCCAAGTCTCCACCCCATTGACGTCAA 
AGCGATAATGGTACCACTACGCCAAAACCGTCATGTAGTTACCCGCACCTATCGCCAAACTGAGTGCCCCTAAAGGTTCAGAGGTGGGGTAACTGCAGTT ?0 ° 
RYY HGD AVLAV HQ VAV IAV . LTG ISKSPPH . RQ 

TGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCG TAACAAC TCCGCCCCATTGACGC AAATGGG CGGTAGGCGTG TACGGTGGGA3 

ACCCTCAAACAAAACCGTGGTTTTAGTTGCCCTGAAAGGrTTTACAGCATTGTTGAGGCGGGGTAACTGCGTTTACCCGCCATCCGCACArGCCACCCTC: ^ ' ' 
E F 7 L A P K 5 T G L S K M S . Q L R P I D a N G R . A C T V G 

gtc ta ta taagc agagctc tc tggc taactagagaacccactgcttac tggcttatcgaaattaatacgactc actatagg gagacccaagc tggc ta.:;c: 

CAGATATATTCGTCTCGAGAGACCGATTGATCTCTTGCGTGACGAATGACCGAATAGCTTTAATTATGCTGAGTGATATCCCTCTGGGTTCGACCGAT.-3 ^ 

I > 
I — T7 promoiof priminq site —J 

GL YKQSS | LAN R THCL LAYRN . YDSL . GOPSWLA 

gtttaaacttaagcttaccatggggggttctcatcatcatcatcatcatggtatggctagcatgactggtggacagcaaa tgggtcgggatct^ 

CAAATTTGAATTCGAATGGTACCCCCCAAGAGTAGTAGTAGTAGTAGTACCATACCGATCGTACTGACCACCTGTCGTTTACCCAGCCCTAGACATGrTG " 

| > | 

t—ProBond binding domain — i I 1 

F *LKLT MGG SHHHHHHGMASMTGGQQMGROLYD 
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GATGACGATAAGGTACCTAGGATCCATGCAAATGAGGAGGAGGAGCCAGAGAAGAAGGAGG TATCGGAGCTGCGCTCTGAGCTATGGGAGAAGGAAATGA 
CTACTGCTATTCCATGGATC CTAGGTA CGTTTACTCCTCCTCCTCGGTCTCTTCTTCCTCCArAGCCTCGACGCGAGACTCGATACCCTCTTCCTTrA."- K< 

> . I — 

1 | I I I i III l lll I I I 



POD K V P R I HA NE EEEPEKKEVSELRSELVEKEM 



AGCTTACAGACATCCGCTTGGAGGCCCrCAACTCTGCCCACCAACTGGATCAGCTTCGGGAGACCATGCACAACATGCAGTTGGA TGAA 
TCGAATGTCTGTAGGCSAACCTCCGGGAGTTGAGACGGGTGGTTGACCTAGTCGAAGCCCTCTGGTACGTGTTGTACGTCAACCTCCACCTGGACGACT" 




K L T D 1 R L E ALNSAHQLOQLRETMHNMQLE 



V D L L K 



AGC AGAGAATGACCGACTGAAGG TAGCCCCAGGCCCCTCATCAGGC TCCACTCCAGGGCAGG TCCCTGGATCA TC TGCATTAT CTTCCCCACGCCGCTCC 
TCGTCTCTTACTGGCTGACTTCCArCGGGGTCCGGGGAGTAGTCCGAGGTGAGGTCCCGTCCAGGGACCTAGTAGACGTAATAGAAGGGGTGCGGCGAGG 



-pLMS inserts U3 



-ORF U3 



A E N , D R LKVAPGPSSGSTPGQVPGSS 



ALSSPRR3 



CTAGGCCTGGCACTCACCCATTCCTTCGGCCCCAGTCTTGCAGACACAGACCTGTCACCCATGGATGGCATCAGTACTTGTGGrCCAAAGGAG G AA 
GATCCGGACCGTGAGTGGGTAAGGAAGCCGGGGTCAGAACGTCTGTGTCTGGACAGTGGGTACCTACCGTAGTCATGAACACCAGGTTTCCTCCTTCAC* 



- pLM5 insert = U3 



-ORF U3 



- G L A L T H 



3 f G p 3 L ADTDLSPMOGISTCGPKEEV 



CCC TCCGGGTGGTGGTGAGGATGCCCCCGCACCAC ATCA TGAAAGGGGAC T rGAAGCAGCAGGAATTCTTCCTGGGCTGTAGCAAGG TC AGTGGAAAAG~ 
GGGAGGCCCACCACCACTCCrACGGGGGCGTCGTG-AGTAGTTTCCCCTGAACTTCGTCGTCCTTAAGAAGGACCCGACATCGTrCCAGTrArrTTTrrl 




TLRVVV^MPPQH 



tKGDLKQQEFFLGCSKVSGf 



'GACTGGAAGAT GCTGGATGAAGC |G"^TTCCAAG~3TTCAAGGACTATATTTCTAAAATGGACCCAGCCTCTACCCTGGGAC TAAGCACTGAGTCCA 
ACTGACCTTCTACGACCTACTTCGACAAAAGGTTCACAAGTTCCTGATAT AAAGATTTTACCTGGGTCGGAGATGGGACCC TG ATTCGTGAC TCAGGTA^i 




— — ORF U3 - — 

P . W K . M I P E A v F Q V F KOYISKMOPASTLGLSTESI 
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fig 52 pLM5 (1 > 542S) Site and Sequence 

CATGGCTACAGCATCAGCCACGTGAAACG AGTGTTGGATGCAGAGCCCCCCGAGATGCCTCCTTGCCGTCGAGGTGTCAATAACATATCAGTCTCCCTCA 
GTACCGArGTCGTAGTCGGTGCACTTTGCTCACAACCTACGTCTCGGGGGGCTCTACGGAGGAACGGCAGCTCCACAGTTATTGTATAGTCAGAGGGAG" " 



- pLMS insert = U3 



-ORF U3 



H G Y S I S H V K R V L D A E P P EMPPCRRGVNNISVSL 
AAGGTCTGAAGGAG AAATGCGTCGAC AGCCTGGTGTTCGAGACGCTGATCCCCAAGCCGATGATGCAGCACTACATAAGCCTCCTGC TGAAGCACCGGC1 : 

' ' ' 1 ' ) ■ ■ i I I I I I i I i I ii i I i ~\ !p-V 

TTCCAGACTTCCTCTTTACGCAGCTGTCGGACCACAAGCTCTGCGACTAGGGGTTCGGCTACTACGTCGTGATGTATTCGGAGGACGACTTCGTGGCCGC 



- pLMS insert s U3 



-ORF U3 



KGLKEKCVOSLVFETL I PKPMMQHY ISLLLJCHRR 

'■' ■ ' ' ' ■ ■ - - - - ' ■ ■ 

CCTCGTCCTCTCGGGCCCCAGCGGCACGGGCAAGACCTACCTGA CCAATCGCTTGGCCGAGTACCTGGTGGAGCGCTCTGGCCGTGAGGTCACAGAGGGC 
GGAGCAGGAGAGCCCGGGGTCGCCGTGCCCGTTCTGGATGGACTGGTTAGCGAACCGGCTCATGGACCACCTCGCGAGACCGGCACTCCAGTGTCTCCCG '"'^ 



~ pLMS insert = U3 



-ORF U3 



LVLSGPSG TGKTYL TNRLAE YLVERSGR 



E V T E G 



ATCGTCAGCACCTTCAACATGCACCAGCAGTCTTGCAAGGATCTGCAACTGTATCTTTCCAACCTAG CCAACCAGATAGACCGGGAAACAGGAATTGGG: 
TAGCAGTCGTGGAAGTTGTACGTGGTCGTCAGAACGTTCCTAGACGTTGACATAGAAAGGTTGGATCGGTTGGTCTATCTGGCCCTTTGTCCTTAACCC: 



- pLMS insert = U3 



"~ — — ORFU3 — _ — . 

I VS TFNhH QQ SC KDLQLYLSNLANQIORETGt G 

ATGTGCCCCTGGTGATTCTATTGGATGACCTGAGTGAAGCAGGCTCCATCAGTGAGTTGGT CAATGGGGCCCTCACCTGCAAGTATCATAAATGTCCCTm 
:ACACGG3GACCACTAAGArAACCrACrGGACTCACTTCGTCCGAGGTAGTCACTCAACCAGTTACCCCGGGAGTGGACGTrCATAGTATrTACAGGG^- ~ ' 



-ORFU3 



&V PLV IL LDQ LSEA GSISELVNGALTCKYHKCPV 
TATTATAGGTACCAC CAATCAGCCTGTAAAAATGACACCCAACCATGGCTTGCACTTGAGCTTCAGGATGTTGACCTTCTCCAACAAf!nTr,r,Ar,rrA.;. - 

' — ' ' — ! 1 ■ 1 I 1 ' r -ill i 1 . 1 i I i i , " 

ATAATATCCATGGTGGTTAGTCGGACATTTTTACTGTGGGTTGGTACCGAACGTGAACTCGAAGTCCTACAAC TGGAAGAGGTTGTTGCACCTCGGTCGG 



■ pLM5 insert = U3 



1 " — — ORFU3 — — 

1 . 1 G T T NOPVKMTPNHGLHLSFRMLTFSNNVE 
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fig 52 pLM5 f 1 > 5425) Site and Sequent 



Page <, 



AATGGCT TCCTGGfTCGTTACCTGAGGAGGAAGCTGGTAGAG TCAGACAGCGACATCAATGCCAACAAGGAAGAGCTGCTTCGGGTCCTCG AC TGGGTA;; 

ttaccgaaggaccaagcaatggactcctccttcgaccatctcagtctg tcgctgtagttacggttgttccttc tcgacgaagcccacgagctgacccat:; 



-ORF U3 



N G F . L V R y I R R . * I V E S 0 S 0 I N A N K £ E L L R V L D 



W V 



ccaagctgtggtatca 



tctccacaccttccttgagaagcacagcacctcagacttcctcatcggcccttgcttctttctg tcgtg tcccat tggcattga 

GGTTCGACACCATAGTAGAGGTGTGGAAGGAACTCTTCGTGTCGTGGAGlcfGAAGGAGlAGCCGGGAACGAAGAAAGACAGCAC AGGGTAACCGTAACT 

■ pLM5 insert = U3 ~ _ 



ORF U3 

P *, I V YHLHTFLEKHSTSD 



F I IGPCFFLSCPIGIE 



GGACTfCCGGACC TGGTTCATTGACCTGTGG 

CCTGAAGGCCTGGACCAAG TAACTGGAC ACCTTGTTGAGATAGTAAGGGaTAGATGTCCTTCCTCGGTTCCTACCCTATTTCCAGGTACCTG TC TT TCGA 



25C-: 



-ORFU3 



° F R T V F 1 ° L y N N S ' IP YLQEGAKDG I K V H G Q K A 
GCTTGGGAGGAC CCAGTGGAATGGGTCCGGGACACACTTCCC TGGCCATCAGCCCAACAAGACCAATCAAAGCrGTACCArrTGrrrrr.rrrArrr.r.:, 



cgaaccctcctgggtcaccttacccag gccctgtgtgaag^accggtagtcgggttgttctggttagtttcgacatggIggacggggg^ 

■ pLM5 insert = U3 



AWEOPVEVVRDT 



-ORF U3 



LPWPSAQQOQSKLVHLP 



P P T V 



GCCCTCACAGCATTGCCTC acctcccga ggataggac AGTCAAAGACAGC ACCCCAAGTTCTCTGGACTCAGATCCTCTGATGGCCATGCTGCTGAAACT 
CGGGAGTGTCGTAACGGAG TGGAGGGCTCCTATCCTGTC AGTTTCTGTCGTGGGGTTC AAGAGACC TGAGTCTAGGAGAC TACCGGTACGACGACTTTt'll 



- pLM5 insert = U3 



^ P H S I A S P 



-ORFU3 



P£ , 0RTv '<0S TP5 SLDSDPL M A M L L K L 
TCAAGAAGCTGCC AAC ^^^"^GAGTCTCCAGATCGAGAAACCATCCTGGACCCCAACCTTCAGGCAACACTT TAAGGGTTCGGCAATCAC TGTCACCCC 

agttcttcgacggttgatgtaactcag aggtctagctctWggtaggacctggggtt ggaagtccgttgIgaaattcccaagccgttag^ 

• pLM5 insert = U3 



0 E A A N Y 



— ORr U3 — 

IESPDRE T ILOPNLO 



A T L 



G N H C H P 



BNSDOCID: <WO 982481 0A2J_> 
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fig 52 pLM5 (1 > 5425) Site and Sequence 9 7 

CG3ACAGCAGAAC6CTGGCATCAGCTATCTTAGCTCCTCCTCTCCCCTCTCCTCTTTCAGAGCACTGGC TC TCCAGCCCCAGGAGGAGAACAGGAGGGA3 
GCC'GTCGTCTTGCGACCGTAGTCGATAGAATCGAGGAGGAGAGGGGAGAGGAGAAAGTCTCGTGACCGAGAGGTCGGGGTCCTCCTCTTGTCCTCCCTC 



-pLM5 insert = U3 



R T A E R V H Q L S . L L L S P I L F Q S T G S P A P G G E Q E G 

GAGGAGATGAAAGAGGAGGGACAGGTTCTTGGTGCTGTACCTTTGAGAACFTCCTAGGAAGGAA TGGTGGGGTGGCGTTTGGGAACTTGTGCCCCCTAAA 
C TCCTCTAC TTTCTCC TCCCTGTCCAAGAACCACGAC ATGGAAACTCTTGAAGGATCC TTCC TTACCACCCCACCGCAAACCCTTGAACACGGGGGATTT 



- pLM5 insert = U3 



GGDERGGTGSVCC TFENFLGRNGGVAFGNLCPLM 

• ■ ■ ■ ' ' . > . . . . ■ . . . ■ . ■ . ■ . , . . 

CACATTTACTGGCCTCCTC TAGAGGGCCCGTTTAAACCCGCTGATCAGCC TCGACTGTGCC TTCTAGTTGCCAGCCATCTGTTGTTTGCCCC TCCCCCGT 
GTGTAAATGACCGGAGGAGATCTCCCGGGCAAATTTGGGCGACTAGTCGGAGCTGACACGGAAGATCAACGGTCGGTAGACAACAAACGGGGAGGGGG'"A ^ 

— pLM5 insert a U3 — 1 

T F T G LL RARLNPL I SLDCAF .LPA I C C L P L P R 

1 ' ■' ' 1 . A ■ ■ > i .... 1 ... . , ■ 1 I — . > , . 1 . 

GCCTTCCTTGACCCTGGAAGGTGCCACTCCCACTGTCCTTTCCTAATAAAATGAGGA AATTGCATCGCATTGTCTGAGTAGGTGTCATTCTATTCTGGGG 
CGGAAGGAACTGGGACCTTCCACGGTGAGGGTGACAGGAAAGGATTATTTTACTCCTTTAACGTAGCGTAACAGACTCATCCACAGTAAGATAAGACCCC 
Afr LOPGRCHSHCPFL[K.GNClALSE.VSFYSG 



GGTGGGG TGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGACAATAGCAGGCATGCTG GGGATGCGGTGGGCTCTATGGCTTCTGAGGCGGAAAGAACCA 
CCACCCC ACCCCGTCCTGTCGTTCCCCCTCCTAACCCTTCTGTTATCGTCCGTACGACCCCTACGCCACCCGAGATACCGAAGACTCCGCC tttcttggt 
GVGGAGQGGGGLGRQ.QACVGCGGLYGF.GGKMO 



gctggggctctagggggtatccccacgcgccctgtagcggcgcattaagcgcggcgggt gtggtggttacgcgcagcgtgaccgctacacttgccagcgc 

CGACCCCGAGATCCCCCATAGGGGTGCGCGGGACATCGCCGCGTAATTCGCGCCGCCCACACCACCAATGCGCGTCGCACTG3CGATG7GAACGGTCGCG 
L G L GV3PRAL . RR I KRGGCGGYAQRDR YTCQP 



CCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCC TTTCTCGCCACGTTCGCCGGCTT TCCCCGTCAAGC TCTAAATCGGGGC ATCCCTTTAGGGTTCCGA 
GGATCGCGGGCGAGGAAAGCGAAAGAAGGGAAGGAAAGAGCGGrGCAAGCGGCCGAAAGGGGCAGTTCGAGATrTAGCCCCGTAGGGAAATCCCAAGGC- 
; ' S A R S F R F L P F L SRHVRRLSPSSSKSGHPFRVP 

TTTAG TGCTTTACGGC ACCTCGACCCCAAAAAAC T TGAT TAGGGTGATGGTTCACGTaGTGGGCC ATCGCCCTGATAGACGGTTT TTCGCCC TTTGACGT 
AAATC ACGAAATGCCGTGGAGCTGGGGTTTTTTGAAC TAATCCCAC TACCAAGTGCATCACCCGGTAGCGGGACTATCTGCC AAAAAGCGGGAAAC TGCA 
1 CFTAP RPQ KT . LG VFT.VAIALIOGFSPFDV 

TGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCG GTCTATTCTTTTGATTTATAAGGGATTTTGGGGAT 
ACC 7C AGGTGCAAGAAATTATCACC TGAGAAC AAGGT TTGACCTTGTTGTGAGTTGGGATAGAGCCAGATAAGAAAAC TAAATATTCCC TAAAACCCCT* 
0 V H f V L . - V T LVPNVNNTQPYLGLFF F I R D F G D 

TTCGGCC TATTGGTTAAAAAATGAGCTGATTTAAC AAAAATTTAAC GCGAATTAATTCTGTGGAATGTG TGTC AG T TAGGC T G 7GGAA AG TCCCC AGGC " 
AAGCCGGATAACCAATTTTTTAC TCGAC TAAATTG TT TT TAAATTGCGCT TAATTAAGACACCT TACACACAG TCAATCCCACACCTT7CAGGGGTCCGA 
F 5 I I V KK. A D L T K I PEL I L V M V C 0 L G C G K 3 P G 
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fig 52 pLMS (1 > 5425) Site and Sequence Page I 

" CCCCAGG " GGC ^^ G ^[GCAAAGCAT GCATCTCAATTAGTCAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGC AGAAGTATGCAAAGr-A 

ggggtccgtccgtcttcatacgtttcgtacgtagagttaatcagtcgttggtccacacctttcaggggtccgaggggtcgtccgtcttcatacgtttcg; ^ 

S " G " ° K Y A K H * ' Q L V S N Q V w K V p R L p S r Q fc y A , „ 

TGCArCTCAATTAGTCAGCAACCATAG TCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCCGCC CCATGSCTGACT 
ACGTAGAGTTAATCAGTCGTTGGTATCAGGGCGGGGATTGAGGCGGGTAGGGCGGGGATTGAGGCGGGTCAAGGCGGG TAAGAGGCGGGGTACCGACTGA ^ 
A S 0 L V S N H S P A P N S A H P A P N S A Q f R p F s A P v L T 

AATTTTTTTTATTTATGCAGAGGCCGAGGC CGCCTCTGCCTCTGAGCTATTCCAGAAGTAGTGAGGAGGCrTTrTTGGAGGC CTAGGCTTTTGCAAAAAG 

TTAAAAAAAATAAATACGTCTCCGGCTCCGGCGGAGACGGAGACTCGATAAGGTCTTC ATCACTCCTCCGAAAAAACCTCCGGATCCGAAAACGrTTTT'* 
" F F ' L ° ? G " G R L C «■ • A . P E V V R R L F w B p „ L L Q K * 

CTCCCGGGAGCTTGTATATCCATTTTCGG ATCTGATCAAGAGACAGGATGAGGATCr.TTTrr,rAT^«TT^^<-.„. T ^.^ 

GAGGGCCCTCGAACATATAGGTAAAAGCCTAGACTAGTTCTCTGTCCrACTCCTAGCAA^CGTACTAACTTGrTCTAciTAACGTGCGlcCAAGAGGrr ^ 
A G 5 L Y ' " F " ' • S R ° " " » I V .« H 0 . T , W , A P R F S ^ 

CCGCTTGGGTGGAGAGGCTATTCGGCTATGA CTGGGCACAACAGACAATCGGCTGCTCTGATGCCGCCGTGTTCCGGCTGT CAGCGCAGGGGCGCCCGGT 

ggcgaacccacctctccgataagccgatactgacccgtgttgtctgttagccgacgagactacggcggc acaaggccgacagtcgcgtccccgcgggcca t30C 

■ " L °. G E A ' " L • L G T T » «■ L ■ C R R V p A V S A G A P G 

TCTTTTTGTCAAGACCGACCTGTCCGGTGC CCTGAATGAACTGCAGGACGAGGCAGCGCGGCTATCGTGGCTGGCCACGA CGGGCGTTCCTTGCGCAGCT 

agaaaaacagttctggctggacaggccacgggacttacttgacgtcctgctccgIcgcgccgatagcaccgaccggtgcIgcccgcaaggaacgcgkg; 

S ' C ° " P V " C P E • T A G 8 G S A A , V a G H 0 G R S L R S 

GTGCTCGACGTT G rCACT G AAGCGGGAAGGGA CTGGCTGCTArTGGGCGAAGT G CCGGGGCAGGATCTCCTGTC ATCTCACCTTGCTCCTGCC Ga GAA,: ; 

cacgagctgcaacagtgacttcgcccttccctgaccgacgataacccgcItcacggccccgtcctagaggacagtagagIggaacgaggacggctctttc ^ 

CARRCH • ^ G K G L A A 1 G R S A G A G 5 P V 1SPC5CPE3 
TATCCATCATGGCTGATGCAATGCGGCGGCTGCAT ACGCTTGATCCGGCTACCTGCCCATTCGACCACCAAGCGAAACA TCGCATCGAG.rGAGrArr.T.- 

a.aogtagtaccgactacgttacgccgccgacgtatgcgaactaggccgatggacgggtaagctggtggItcgcIttgtagcgtagctcgctcgtg-atg 

-' G • C A A A A Y A ■ 5 G ' »• " ■ « P P S E T s H R A S t V 

TC3GATGGAA G CCGGTCTTGTCGA r CAGGATG ATCTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCGAAC TGTTCGCCAGGCTCAAGr.CGCGCAT,r,-- 

AGCCTACCTTCG G CCAGAACAGCTAGTCCTACTAGACCrGCTTCTCGTA G TCCCCGAGCGCGGTCGGCTTG ACAAGCGGTCC G AGrTCCGCGCG r ACG'j'; 
^DGSRSCRSG . S G RRASGARASR T'VRQAQGaha" 

GACGGCGAGGATCTCGTCGTGACCCATGGCGATGCC TGCTTGCCGAATATCATGGTGGAAAArGGCCGC TTTTCTGGAT TCATCGACTG T~GGCC5GCTG~ 

CTGCCGCTCCTAGAGCAGCACTGGGTACCGCTACGGACGAACGGCTTATAGTACCACCTTTTACCGGCGAAAAGACCTAiGTAGCTGACACCGGCCGACC ^ 
P- P fl G S RR DPvrc^ LAE Y HGGK VPLFW 1 HRLWPAG 

GT G TGGCG G ACCGCTATCAGGACATAGCGTTG GCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGCTGACCGC TTCCTCGTGrTrTA-- G GTAT 
CACACCGCCTGGCGATAGTCCTGTATCGCAACCGATGGGCACTATAACGACTTCTCGAACCGCCGCTTACCCGACTGGCGAAGGAGCACGAAATGCCatI ^ 
~ ' " L S 8 " S V G ' • ' C ■ P A V R R M 6 . p , p R , L , y 

CGCCGCTCCCGaTTCGCAGCGCATCGCCTTCTA TCGCCTTCTTGACGAGTTCTTCTGAGCGGGACTCTGGGGT rCGAA ATGACCGACCAAGrfiA"f;TfM 

OC^CGAGGGClAAGCGTCGCGTAGCGGAAGATAGCGGAAGAACTGCTCAAaAAGACrCGCCCTGAGACCCCAAGCTTTACTGGCTGGTT'Grr-,,^ ^ 
— ? F AA H.RLL 3 P S . R VL L S G r L G P E „ T 0 „ \" T P 



BNSDOCID: <WO 98248 10A2_L> 
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tiqS2pLM5 (1 >S42S) Site and Sequence Pa ge ^ 

ACCTGCCATCACGAGATTTCGATTCCACCGCCGCCTTCTATGAAAGGTTGGGCTTCGGAATCGTTTTCCGGGACGCCGGCTGGATGATCCTCCA GCGCCi; 

tggacggtagtgctctaaagctaaggtggcggcggaagaIactttccaacccgaagccttagcaaaaggccctgcggccgacctactaggaggtcgcg,-J 51 a 

N L . P S » ° F . ° S T * * P * E " L G F G I V F R 0 A G W M I L 0 R "fi 

GGATCTCATGCTGGAGTTCTTCGCCCACCC CAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTC ACAAATAAAr.,-, 
CCTAGAGTACGACCTCAAGAAGCGGGTGGGGTTGAACAAATAACGTCGAATATTACCAATGTTTATTTCGTTATCGTAGlGrTTAAAGTGTTTATTTCGT ^ 

K A 



0 L H L E F F A H P N L F I A A Y M 6 Y K.SNSITNFTN 



TTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTGTATACCGTCGACCTCTAGCTA GAGCTTGGCGTAATCAT 
AAAAAAAGTGACGTAAGATCAACACCAAACAGGTTTGAGTAGTTACATAGAATAGTACAGACATATGGciGCTGGAGATCGATclcGAACCGCATTAGTA *"* 



p SLHSSCGLSKLlN 



v S Y H V C IPSTSS.SLA 



VS - l-FPy . NCYPLTIPHN IRAGSIKCKAVG 



GGTCATAGCTGTTTCCTGTGTGAAATTGTTATC CGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTG GGGTGCCTAATGAGT 

ccagtatcgacaaaggacacactttaacaataggcgagtgttaaggtgtgttgtatgctcggccttcgtatttcacattIcggaccccacggattactca 5Ua 

A . . V 

GAGCTAACTCACATTAATTGCGTTG 

1 ' i i » 5425 
CTCGATTGAGTGTAATTAACGCAAC 

S . L T L I A L 
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fig 53 pLM6 (1 > 4947) Site and Sequence 96 
Enzymes : All 146 enzymes (No Filter) 

Settings: Linear, Certain Sites Only, Standard Genetic Code 

GTGGC ACTTTTCGGGGAAA rGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATACATTCAAATATGTATCCGCTC ATGAGACAATAACCC TGATAAA" 

CACCGTGAAAAGCCCCTTTACACGCGCCTTGGGGATAAACAAATAAAAAGArTTATGTAAGTrTATACAfAGGCGAGTAC TCTGTTATTGGGAC TAT7TA 
V H F 5 G K C A R N P Y L F I F L N T F K YVSAHETITLITI 

GCTTC AATAATATTGAAAAAGGAAGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTG CCTTCCTGTTTTTGC TCm-Z 
CGAAGTTATTATAACTTTTTCCTTCTCATACTCATAAGTTGTAAAGGCACAGCGGGAATAAGGGAAAAAACGCCGTAAAACGGAAGGACAAAAAC6AGT3 ^ 
A S I I U K K £ E Y E Y S T F P C R P Y S LFCGILPSCFCS 

CCAGAAACGCTGGTGAAAG TAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGCGGTAAGATCCTTGAGAfiT- 
' M ' 1 — ' ' 1 1 1 ' 1 ' 1 i ■ ■ t .i i i | — , i | 

GGTCTTTGCGACCACTTTCATTTTCTACGACTTCTAGTCAACCCACGTGCTCACCCAATGTAGCTTGACCTAGAGTTGTCGCCATTCTAGGAACTCTCAA 
p B M A G E S K R C . R S V G C T S G LHRTGSQQR .OP EF 

TTCGCCCCGAAGAACGTTT TCCAATGATGAGCACTTTTAAAG TTCTGCTATGTGGCGCGG TA TTATCCCGTATTGACG CCGGGCAAGAGCAACTCGGTCG 

AAGCGGGGCTTCTTGCAAAAGGTTACTACTCGTGAAAATTTCAAGACGATACACCGCGCCATAATAGGGCATAACTGCGGCCCGTTCTCGTTGAGCCAGC 
S P R R T F S N 0 E H F . 5 S A M W R G I IPY.RRARATRS 

CCGCATACACTATTCTCAGAATGACTTGGTTGAGTAC TCACCAGTCACAG AAAAGCATCTTACGGATGGCAT GACAGTAAGAGAATTATGCAGTGC TGCC 
GGC6TATGTGATAAGAGTCTTACTGAACCAAC TCATGAG TGGTCAGTG TC TTTTCGTAGAATGCCTACCGTAC TGTCATTCTCTTAATACGTCACGACGo 5 °° 
P H T > F S E ' 1 G . • V L T S H R K A S Y GVH0SKR1MQCC 

ATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCAC AACATGGGGGATCATGTAH 
TATTGGTACTCACTATTGTGACGCCGGTTGAATGAAGACTGTTGCTAGCCTCCTGGCTTCCTCGATTGGCGAAAAAACGTGTTGTACCCCCTAGTACATT ^ 
H " H £ ' • H , C G Q > TSONDRRTEGANRFFAQHGGSCN 



CTCGCCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGTAG CAATGGCAACAACGTTGCGCAAACT 
GAGCGGAACTAGCAACCCTTGGCCTCGACTTACTTCGGTATGGTTTGCTGCTCGCACTGTGGTGCTACGGACArCGTTACCGTTGTTGCAACG 

S P • S L G T G A E -SHTKRRA.HHDACSNGNNVAQT 



ATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGAC CACTTCTGCGCTCGGCCCTTCC^ 
TaaTTGACCGCT TGATGAATGAGATCGAAGGGCCGTTG7TAATTATCTGACCTACCTCCGCCTATTTCAACGTCCTGGTGAAGACGCGAGCCGGGAAGG- a 
1 H V R T T Y 3 5 F P A T I N R L D G G G . S C R T T S A L G P 3 

GCTGGCTGGTTTATTGCTGATAAATC TGGAGCCGGrGAGCGTGGGTCTCGCGGTATCArTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCG TATCGTAC- 
CGACCGACCAAATAACGACTATTTAGACCTCGGCCACTCGCACCCAGAGCGCCATAGTAACGTCGTGACCCCGGTC TACCATTCGGGAGGGC ATAGCA jz ^ 
GV , LVYC t - IV SR.AUVS RYHCSTGARV , ALPYRo 

TTATCTACACGACGGGGAG |CAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGCCTCACTGATTAAGCAT TGGTAACTGTCAGACCm 

aatagatgtgctgcccctcagtccgttgatacctacttgctttatctgtctagcgactctatccacggagtgactaatt^ :osc 

Y L H D G E , S G N Y G , T K . T D R . D R C L T 0 . A L V T V R P 

agtttactcat atatactttagattgatttaaaacttcatttttaatttaaaaggatctaggtgaagatcc 

tcaaatgagtatatatgaaatctaactaaattttgaagtaaaaattaaaItttcctagaIccacttctaggaaaaactaWagagtact^ lX 

S L L - ' Y T \ D • F , * T S / L I . K D L G £ D p F , . S H D 0 N P 
TAACGTGAGT TTTCGTTCCAC TGAuCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTC rTGAGATCCTT TTTTTC TGCGCG T AA TC TGC TGCTTGC'AA» 

a t tgc ac tcaaaagcaaggtgac tcgcagtctggggc atct t ttctagtttcc tagaagaac rc Jaggaaaaaaagacgcgc attagacgacgaacgt t ~ 

L 1 • y F V P | t- S V R P R R K 0 0 R I F L R S F F 3 A R M L l A fl 
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tig S3 pLM6 (1 > 4947) Site and Sequence Pa S& 1 

CAAAAAAACCACCGCTACCA6CGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAAC7GGCTTCAGCAGAG CGCAGATACCAAA 
GTTTTTTTGGTGGCGArGGTCGCCACCAAACAAACGGCCTAGTTCTCGATGGTTGAGAAAAAGGCTTCCArTGACCGAAGTCGTCTCGCGTCTATGGTTr 
K K T T A T S G G L F A 6 S R A T N S FSEGNWLOQSAOTt 



TACTGTCCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCC 



TGTTACCAGTGGCTGCT 



atgac aggaagatcacatcggcatcaatccggtggtgaagttcttgagacatcgtggcggatgtatggagcgagacgattaggacaatggtcaccgacga 18 

S G C 



VCP SSVAVVR pp[_Qg L C S T A V I P R s A N P V T 



GCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAG TTACCGGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACAfAKr 



:tattccgcgtcgccagcccgacttgccccccaagcacgtgtgtcg 



C ° W R ■ V V . S Y R V G ■- * T I V T G . G A A V G L H G G F V H T 

ccagcttggagcgaacgacctacaccgaactgagatacctacagcgtgagctatgagaaagcgccacgcttcccgaagggagaaaggcggac^ 



A 



:aggtatcc 

ggtcgaacctcgcttgctggatgtggcttgactctatggatgtcgcactcgatactctttcgcggtgcgaagggcttccctctttccgcctgtccatagg l60,: 

G G Q V S 



Q L G A N D L H R T E I PTA.AMRKRHASRREK 



GGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCArrTr 



TGACTT 



ccattcgccgtcccagccttgtcctctcgcgtgctccctcgaaggtccccctttgcggaccatagaaatatcaggacagcccaaagcggIggagactgaa ,7CC 

CKR OGRNRRA HEGASRGKRL VSL .SC R V S P P L T 
GAGCGTCGATTT TTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCC TTTTG 

ctcgcagctaaaaacactacgagcagtccccccgcctcggatacctttttgcggtcgttgcgccggaaaaatgccaaggaccggaaaacgaccggaaaac '** 

■ AS LVR GAEPMEKRQQ R6LFTVP G L L L A F C 

CTCACATGTTCTT TCCTGCGTTATCCCCTGA7TCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCTCGCCGCAGCCGAACGA CCGAGCG 
GAGTGTACAAGAAAGGACGCAATAGGGGAC7AAGACACC TATTGGCATAATGGCGGAAACTCACTCGACTATGGCGAGCGGCGTCGGCTT6C TGGCTC6C ^ 
S " V . L S C V 1 P • F C S. • P V' " L ■ V S - . Y R S P Q P H Q B A 

CAGCGAGTCAGTG AGCGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGC TGGCACGACAGGTTT 

GTCGCTCAGTCACrCGCTCCTTCGCCTTCTCGCGGGTTATGCGTlTGGCGGAGAGGGGCGCGCAACCGGCTAAGTAATTACGTCGACCGlGCTGTCCAAA ^ 
R v S E R G S G R A P N T Q T A S P R A L A O S L M Q L * P 0 V 

CCCGACTGGAAAGC GGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCG GCTCGTATGT 
GGGCTGACCTTTCGCCCGTCACTCGCGTTGCGTTAATTACACTCAATCGAGTGAGTAATCCGTGGGGTCCGAAATGTGAAATACGAAGGCCGAGCATACA ^ 
S * L E S G Q_ . A 0 R N . C E LAHSLGTPOFTLTASGSYV 



TGTGTGGAATTG TGAGCG6ATAACAATTTCAC ACAGGAAACAGCTATGACCATGATTACGCC AAGCGCGCAATTAACCCTCACTAAAGGGAACA AAAGCT 

acacaccttaacactcgcc tattgttaaagtgtgtcctttgtcgatactggtactaatgcgg ttcgcgcgttaattgggagtgatttcccttgttttcga ; ' 2a 

" W " C E R 1 T » S H P K Q L.P.LRQARN.PSLKGTKA 



GGGTACCGGGCCCCCCCTCGAGGTCGACGG TATCGArAAGCTTGATATCGAATTCCTGCAGCCCGGGGGATCCATGCAAATGAGG AGGAGGAGCCAGAGA 
CCCAIGGCCCGGGGGGGAGCTCCAGCTGCCATAGCTATTCGAACTATAGCTTAAGGACGTCGGGCCCCCTAGGTACGTTTACTCC 



TCCTCCTCGGTCTCT 



- U3 stuk 



GYRAPPRQpp YR • a ■ Y R I PAA RGIHANEEEEPE 
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Swings : Ctecuiar t Cartaln suae Only. Smnaam GanoUc Coat 

ATCACCATCAmCCKCMCmGCATOCaCCAOWmGATATCAACCrTATCGATACCCTCGAC^ 

CA C ATCCATTAT6CCAC CCCCCGTTTCTA ACTGAfTrrrMTTTTCACTTTAC 5ACTACAA AA ATGTGTTCTTTAA TAACTAT<TTCCA£TTGACTCTATT 209 
CTCTATGACT AC7TGTTGACTC ATTTTT CAT7CAG AAAAT ATTAAA AG GAACATTATTTA CXTTGCTTATTTGCCCT MCTTTGATTTJUrTTTTTCWTC 300 
AACTAGATCTT ACAAAACTTCC AATAC AATTCCATTTTCACATTACCCT CGCCA C GTGTCG CCACGTCAGCAACCGCTTCAG CAACTAACCCAAATTCCA 44)0 
ACTTTC CA CAAATGTC A AC ATC CA G GCT7 C A G A CTC CA CA GTC hA GAATA TCGAAAATTGGT AA GAaTTTT A7TT7 G AGCT C AAA CTTGTA T AAAATGCC 569 
CA GAAAAG AAGATGATAAA AATGTA CTTTTTn (X AAAACTTC CACCTTTATTGCTCTAATATC ACCGCTTAT ATCTCAATTTTCTTGACTTTTATCAAA 600 
AAATTTf CCACTATACAAA rGTAGAAAAGTATTTTGCAtAAATrnGTCAGTTGACAG 7€d 
AGTOGAAGGAGCGCAAGTCTATACTGCAAATAATG^TCTGAAAC^ m 
CACTAXjTTTCAGAACCTTCCTTTTTGTATGAAAAA^ 90^ 
7GCAATTTACAAATCCKCTCCCCTTGCC CG AAAAGTGCCCACCAAAATCAATTTCTC GGCTTCATAATGACTTT7 AAATTGATCTGAGAAAACACACAAG 1000 
AGGCTAACTAMTTGACAGfiGACAGGrrGTCCCTCTTCTCCCTCCTTCTCCCGCCTCCTCCT 1108 
TTC"TCCATrrrGCTTATAAACATTTGTGTCT GGAAGGAAACTACACGCGGAGACCGTCAA 1TAATTCGAATGAG AGCA TGGCAATTACT CTTTCGGAAAT 1200 
TGATGAA TAAA GATA6AGCCGA7 GACACTGGCTGGT AG7 AGTATGAGTGTAG AATTGCTTTTTCATCGTC7 CAACTTGCGCATGAGTCTTCCCCCCCTCT 1300 
CATCACTGACAATTAATGTCGGGTTTTATCCGCTCrTT^ 14*9 
ATTTATTTTGATArrrAATTTT(TTGCAAT^ 1MW 
TTCAAAAAA TCAATAAATMTC C CTAAC AAA TT GT A TGGCTAA A ATT77A TTT CTACTGTTGAC AA Ta TCTTTAT A TGT ATCACTGTTTTCCATCTCAAA 1000 
ACCTTGAATCCCCCAAGTTATAG<^GCTCCGTGT(>CATTTC^ 1700 
TTATfGGCCACGCGTAATAAAGTCCA>GCAGTTAGAATTTTM 1809 
rTTTCCATGCTTTn"GGCCCAT7AAAA^CrnCTC*CCTCT7CATCCATC^ 1900 
AGAAG6AGATACTGAGCCACA I'GGCGTCTGACCCnTTCATCTCGTCCCT AACTCTf CAAA TAGCCATAGACCTC 2000 

CTTGTTTTCTTCTTCGTTTTGACTCGCGCCTATTTTTTG^^ 

AAAAA CGAGGCAJITTTAAAAATATTAAAA TT AATCAG GTT GTAGATGTAGA TTTGGAAAAGAAG AAAAAAACAAAACaAATAGGAA CCGCCAGATCAAAA 2200 
TTCTATTTAAA G GTTTTC AA6AT<PTT AG GC AAGATTC GGCTGA ACAG AA AACTGAAGTGC CTGC A7AAATCT AGTGT AACGTTTAGATTGAACTCGGAA 2300 
ATCCT AAGCCTGAACTATACCC7TATTCT AGATCTTAGTTGCGCATAAGCTCAAGCCCAAGCAGAAATGACTTGCA 2400 
GCTTCCTTCAGTCTAATCCAGACTAGATTTCCAAGAGAGTTTTCAATT^ 2500 
AAAATCGTTATCCCTTTCTCTCACACTTTCAATTACAGATTCATCAAAGATTGGTATCW 2600 
ACTTCATC^AATAATACAAATTCATTCCCTrCCGTCWGCCff 2700 
TCCGATCCrrCCGGCTTCTTTTTAGAAATTATATTllTTCAiGAATCATCA^ 2«K> 
AAAACCTTCTAGACCACAaACCCA GCTAGTTCCTGTT G CTACAACTACAAA AATC G GAAGCTC AAAGCTAGCC GCTC i'GAAA 6CC OTGAG CA CCCCAAAA 2900 
CTOiCTTCTGTGAA^CTATTGGAGCA^AACAAGAGCC^ 3000 
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3100 
3200 
33(90 



CATCmCTaTCOUTAGCCUCAACCTACGAG^^ 

GAA GCCGCCGACCAGTAAGCTGG GAAGTGCCACGTCTATGTCGAAGCTTTGTAC G CCAAAAGTTTC CTA CCCTAAAAC G GACGC CCCAATtATATCTCAA 
CAAGAaCGAAAC^TKTCAAAGAG^ 

GCATCCATTCCACATCTTCCAAWGTTCAACGTCAWCGAAAAGTCTCCCTCATCiGACG J408 

GCCGATAGCCGCAACACC&GTTTCTCCAAATATTAtCAACAAGCQGTT(UGGAAAAAC 35W 

GAKCACaCCAGCT<mTCGCaCCTGA<^^ 3€W 

CTCAAAAACCAGAACCTGAAAAGCTCCAATCAATGAIXATCMCACGACGGACGT^ 37W 

TTCAATCCCAOACCACCAACGTACGATGTTmaAAAACAAG(yUAAATCACATCGCnGTC^ 34W 

GAaCCATTGTGOTaTGCCTCGGCTCAGCTC^aCCGCCCkACAAAAACmTGCTAATCAm 3993 
AATCC^GCGGCrACACrrrrfiArrjvr.rT/rrTfi^^ 

'^"««^*™^i«*^wMii»i-u«iM«*^Ai*jAl.f*.*#iuUAULALAUAAI.&& MMS0 

C7 ATC CTGA CAACTTCGA AGACAG TTCCTC CTTGTCGTCTGGAATATCCG ATAACAAC GAGCTCGACGAC A TATCCAC GGACGATTTGTC CGGAGTAGAC 4100 
ATGGCAACACTCGCCTCCAAACATAGCGACTArrCCCACTT^ 4m 

CA67C6A7TCTCGATCTCGAGCAGAACAGGACAATGTGTACAAACTTCTGTCCCaGTGCCGAAC<UGCCAAC<»^ 43M 
ACAACATTCG C7 AAGATC CCCGGG ATACTCATCCTATTCTCCACA CTTA 7 C<GTGTCAGCTG ATAAGGACAOATGTCTA TGCACTC ACAGACTAGTC GA 4400 
CGACCTrCTTCACAAAAACCAAGCTATTCAGGCCAATTTCATTC 4S 00 
CTCTETTGAGCCCGAGACG GGT G CCG AACTC CA rGTCGAAATATG ATTCTTCA CC ATC CT ACT CGG CGCGTT CCCGAGCTGG AAGCTCTACTGGTATCTA 4000 
TGGAGAGACGTrCCAACTGCACAGACTATCCGATGAAAAATCCCCCGCA 4 ; M 
GCATATGGATCTCTCAATCAGAAGTA CGAA C ATGCTATTC GGGAt ATGGCA C CTGACTT WACTCTTACAA GA ACACTCTCGAO CACTAACCAAGAAAC 4800 
AGGAGAA CT ATGGA GCATTGTTTGAU r r 1 1 1 6AGCAAAAGCTTAGAAMCTCACTCAACACATTGATCGATCCAACTTC AAGCCT6AAGAGGCAATACG 4300 
ATTCAGCCAGGACaTTCXTCATTTGAGGGATATTAGCAATCATCTTCCATCCAACTCAGCT saw 
TCTCTGGAATCAGTTGCATCCCATCGATCATC&ATGTCATCGTC(TTCGAAAAGCAGCAAKAGGAGAA<WTC^ $ m 
AGAGCTGGATCCGCTCCrCACTCTCCAAGTTCACCAAGAAGAAGAACAAGAACTACCACCAAGCACATATGCCA TCAATTTCCGGA T CTCAAGGAACTCT 5230 
TGACAA CATTGATGTGATTGAGTT GAA GCAAG A GCTC AAAGAACGCG ATAGTGCA CTTTACG AAGTCC GCCTT C ACAA TCTGGATCGTGCCCCCGAAGTT S3B0 
<^OTa<*GG(a<aa<rrCAACAAGUCAAAACCU^ S400 
^CCGCGCCTCAATTCCAtTrTATCTACGACGATGAGCATGTCTATC SS00 
CAACTCAATCAACCTTACTGTAAACGTGCACATC GCTGGAGAAATCAGTTCGATCGTTAACCCGGACAAAGAGA TAATC GTAGGATATCTT&CCATCTCA 5600 
ACCAGTCAGTCATGCTG6AAAGACATTGATGTTTCTATTCTAGGACT 5700 
CTCCTGATTCTATCCTT GG CTATCAAATTGGTGAACTTCGA CGCGTCATTGGAG A CTCCACAACCATGATAACCACCCATCCAACTCACA TTCTTACTTC $800 
CTCAACTACAATCCGAATGTTCATGCACG<fTGCCGCAeA(a^ $900 
CrrCAAGTCAATTrrGACAGACAGACGTCTGCTGTTAGCTGGAGC^CTGGAATTGCAAACAGO^ £000 
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GAACAAATTAA TCCGAAGAT AG TATTGTTAATATCAGCATTCCTGAAAACAATA^GAAGAATT^^ 6100 
AA GCAAAGAATC A ^GC ATC (TTAATTCTAGATAATA TC CCAAAC AATC(^TTCCATTTCTTCT ATCCGTFTTTC CAAATGTC CCACTT CAAAACAACGAA 6209 
GGTCCATTTGTAGTATGC ACA GTCAACCGATATC AAATCCCTC AGCTTCA AAtTCAC CAC AATTTCAAAA TGTC ACT AATGTCGMTCGTCTCCAAGGAT 6380 
TCATCCTACGTTACCTCCGACGACGGGCGXrTAGAGGATGAGTATC ^ 
AGOCTTCAGGCCGTCAATAATTTTATTCAGAAMCGAATT^ ^ 
TCCCGTGAATGGTTCATTCGATTGTGGAAT<iA GAACTT CATTCC ATATTTGC AACCTGTTGCTAGAGATGG CA AAAAAACC^ 6606 
TCGAGGATCCCACCGACATCGTCTCTAAAAAATGGCCGT^ 67dd 
GTCACCTGCCAACTCATCCC6ACWCACTTCAATCCCCTCGACT <Sdd 

GAaaAATcrrcrcTCGCcrracccccGCTTTccrrATcncGTACCGCTACw «9ao 

CCATCTCGCGCCCGTGCCTCTGACTTCTAACTCCAAT^ACT 7008 
CAAAAAAACTTCTTCTTAATTTCTTTGTTTTrr AGCTT CTTTTAA GTCAC CTCTAACAATGAAATTGTGT AG ATTCAAAAATAGAA TT AATTCGTAATAA 7100 
AAAGTCGAAAAAA.A TTGTGCTrCCTCCCCCCATTAATAATAArrCTATCCCAAAATrTACAC AATGTTCTGTCTAt^ TGTTTTnTTACTTCT 7200 

GATAAAi i 1 1 1 1 1 i GAAACATCATAGAAAAAACCGCACAC AAAAT ACCTTATCATATCTrTAXGTTTCAGTTTAT 7300 
TCTGGGCCTCTCATGACGTCAAATCATGCTCATCCTG^^ 7400 
TGCTTT7 GCTTTTTGGGGCTTTCCCCTATTCTTTGTCAAGAGTTTCCAGGACSGCGTT^^ tSOO 
AA^TCGGAAGMGCrrrCGGTTTGAGGCTCACTGGAAfiGTGACTAGAACTTGAT TgOO 
CAGAATACATTCCCAATATACCAAACATAACTGTTTCn^ 7700 
GACACATGCAGCTCCCGGAGACGGTCACAGCTTCTCTGTAAGCGGATGCCGu^ 7600 
GGGCTGGCrTMCTATGCGGOTCAGACCACATTGTACTGAGAGTGCAC 7900 
AGGCGGCCTTAAGCCCCTCCTGATACGCCTATTTTTAT^ 

CGCGGAACCCCTAM 1 1 A TTTTTCTAAATAC A TTC AAATATGTA TC CGCTC ATGAGA CAA I AAC CCTGATAAATGCTTCAATAATATTCAAAAAG GA 6100 
AGAGTATtoGTArrCAACATTTCCGTCTCGCCCTT^ g 2 00 
AGATGCTGAA GATCAGTTGGGTGC ACGAGTGGGTTAC ATC GA ACT6C ATCT CAA CAGCGGTAAGA TCCTTG AG ACTTT7CGCC CCG AACAA CGTTTTC CA 8360 
ATGATGACCA CTTTTAAAGTTCTGCTATGTGGCG C GCT ATTATCCCCTATTGAC GCCGG6C AAuAGCAACTCG GTCGCCG f AT ACAtTATTCTCAGAATG 6400 
ACTTGGTTGAGTACTCACCAGTCACAGAAAAGCArtTTAC^ 8S00 
GGCCAAmACTTCTGArAACGATCGGAGGACCCAAGGAGCTAACCGCTTTTT^ 8600 
GAGCTGAATCAaGCCATACCAAACGACCAGCGTCACACCACuATGC^ 8700 
TACCnCCCGG(^CAATTAATAGACTGGAT<XUGGCCGATAAJlGTTCCAGCAC«CTTaGCK 6100 
ATCTGGAGCCGGTGACCCTGWTCTOXGGTATaTT^ 8980 
GCAACTATtWATGAACGAAATACACAGATCCn-WGATAGGTGCaUCT^nAAGCAnGGTAACTCTaW 9000 
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flq44pNP9Mac (1 > $*» ano Sequence ^ ^ 

ttgatttaaaacttcatttttaatttaaaacgatct 910O 

AttG7CAGACCCCCTAGAAAAGATCAAA^(aTOT^ 920e 
6T66TTT<n"rTCCCC6ATCAAGACCTACCAACTCTTTTTC CGAAGCTAACTCGCTTC ACCAGACCGCA GAT ACC AAATAOCTCCTTCTACTCTA6CCCT 9300 

AGTTAGGCCACCACTTCA>GA>CTC7GTAGCACCGCCTACATAC 9440 
TACCGCGTTGGACTCAACACGATAGTTACC&GATAAGGCGCAGCGCTCGG 

ACCGAACTGAGATACCTACAKCTGAGCA1TGAGAAAGCGCUCG 9668 

GAGAGCGCACGAGGGAGCTTCCAGKG&^CGCCTGGTflTCTTTATA^ 9780 

GTCAGGGGG<»CGGAGCCrATGXAAAJUCGCCAGCAACGCGG<!CTT7TTACCCTTCCT 9060 

TCCCCTCATTCTGTGX^TMCCGTAn'A^CGCCT^ 9980 
CGGAA GACCGCC CA_AT ACGC AAACCG CCTCTCC CCGCGCG TTGGCCO *T?CATT£ATGCAGCTGGC ACGACAGOTTTCCC CACTGGAAAG CGCCCAGTCA 

GCGCAACGCMTTAATCTGACTTAXCTCACTCATTA leiOC 
CAATTTCACACAGGAAACAGCT 18122 
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TTTTCA6ACTTGC GCTATAA ATT7TTGTCA AA ACTAGGAATCTTAAAATATTrGrATTTTTClVAA GAATGCTCCTCAA T CTCAAATTCATATTTTAT A 6100 
ATTTCAGAC CCCGTGATAT CTCAAAAACCACAACCT6AAAAG CTCCAATCAATCA C CATCGACACGACOG ACCT7CCACCCCTT CCACCTCTAAAATCA G 6260 
TT<nTCCACTTAi^T<WCTTCAATCCCACAACCACCAAC(rrACCATCTTCT^ ^ 
G^CCTCCGCGTCTGAAGACT^ M9 
WAAAGA*TAAGACATCA6AATCaGCGGCTACACaa<^ 6Sed 
CTCGTCGAGC ACA GAACGGCTATCCTGAC AACTTC G AA GACAGTTCCTC CTTCTCCTCTGGAATATCCGATAACAAC G AGCTCCA CGACATATCCACGGA 6600 
CGATTTGTCCGGAGT AGACATGGCAACACHXGCCTCCAAACAT^CGACU^ 6?M 
CCGAOrrCGGTCCTCCACATCACTCGATTCTCGATCTCGAGC^^ 6SO0 
CTGCCACCTCAAC CTTC GGACAAC ATTCGCTAAGATC CC CGGGATACTCATCCTATT CTC CACACTTATCaGTGTCaCCTG At AAGGACACAATUTCTAT 6900 
GCACTCACAGACTAGTCGACGACCTTCTTCACAAAAACCAAGO'ArrCAGG^ 7*9* 
*CC^GCACACAATGGCGGCTCTCTTGAGCCCGAGACGGGTGC<GAACTCGAT^ 7100 
GAAGCTCTACTGGT ATCTAT(£A<LACiACGTTtCAACTGCA(>(UCT 72W 
*TCACTGGCTAG<*CGACAGCATATGGATCTCT^ 

GACTCACTAA<CAA<iAAACAGGACMCTATGCAGCATTGTTrWTCTTTTT^ 74a© 
AGCCTGAAGAGGCAATACGAT7CA<K>CAGGACArTGCTCATTTGAG 7500 
TGAGCTTOfTCCT CAAC CA TO CTCGAA TCA GTT GCATCCCATCG ATCAYCGATGTCATCGTCGT C GAAAAGC A GC AAGCAGGAGAAGATCAOCTTGAGC 7689 
TCGTTTG GCAAGAACAAG A AGAGCT G GATCCGCTCCTCACTCTCCAAGTTCACCAAG AAGAAGAAC AAGAA CTA CGACGAAGCACATATGCCATCAATTT 7700 
CCGGATCTCAAGGAACTCTTGaCAACATTGATGTGATT 7fteO 
GGATCCTGCCCGC6AA<rnGAT(rrTC T GAX;GCA6ACA(rrGAACAAGn'GAAAACCGA6A^ 7300 
CCWCCACTCGTGCrTCTTCCCGCGCCTCAATTCCAGTTAT^ 1300 
CG AAACGATCCTCTGGCTGC AACTCAATCAAQG TTACTGTAA ACGT GGACATCGCTGGAGAAATC AGTTC GATC GTTAACCCGGACAAAGAGATAATCGT S100 
AGGATATCTTCCCATCTCAACC AGTC AGTCATGCTGGAAA GACATTGATGTTTC TATTCT A GGACTATTTGAAGTCTACCTA.TCCAGAATTGATGTGGAG B2i» 
C ATCAACTTGGAATCfiATGCTCCTGATTCTATCCTTG GCTATCAAATTGGTGAACTTC CAC GCGTCATTGGA CA CTCCACAA CCATGATAACCACCCATC 8300 

CMaGACAnmAmccTCAAaAOUTccGAAT<rrrcATGacGCTGCCGCACAGA(rrcGC^ 4400 

GCAAATWTTCTCCAACTCGTCAAGTCAATTTTGACAG AGA GACGTCTG GTC (TAGCTGGAGCAACTG GAATTG GAAAGA6CAAACTGG CGAAGACCCTC «SC0 
GCTGCnATGTATCTATTCGAACAAATCAATCCGAAGATAGTATTtTr^ g$00 
GtCrGGAAAAUTOTGAGAAGCAAlGAATCATG<>TCGm^ <700 
CC CACTTCAAAACAACGAAGG TCCATTT5TAGTA TGC ACAGTCAAC CG4TATCAAATCCCT G MCTTC A AATTCA CCACAATTTCAAAATGTCACT AA7C *t09 
TC6AATCCTCTCGAA G GATT C A TC CTAC GTTACCTCCGACGA C GGGCGGT AGAGGATGAGT A TQ GTCTA ACTG TACAGATGCCATCAGAGCTCT7GAAAA 8900 
T CATTGACTTCTTCCCAA TAG CTCTTCAGGCCGT CAATA ATTTTATTGA G/LAAACGAATTCTGTTG ATGTGACAC7TGCTCC AAWiXATGCTTG A ACTC 
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TCaaAACTCTCGATCGATCCCCTGAATGGmAmw™^ 

TTCOCTCGCTIiCACmCTTCMCWTCCCACCMarCCTCTaMAAAATGCCCGTGSTTCQATC 

AAaCCAAWCCTCCTCCCCTCACaCCCAACTOVTCCCGACAACACTTCmcCCCTCMGTCGnCATO^ 

CGACAACAmCAACAGAAGACTaWTCTTarTCGCCTaCCCCCGCrrrCCTTArcmGTACC^ 

««caeraTCAATcamacGc«cc^aTcw«cTTCT 9M8 
cccoATTmcTTAmTCAAAAAAAmcTTmAAmcm<nTrmAGCTTamAAGT« 

TA^TTAAnCGTAATAAWGTCWAAAWATTGTGaCCCTCCCCCCATTAATAATAATTCTATCC^AMTOACAaATGrTCTGTCTACACTTC 9700 
TTATGT7TTTTTTACTTCT«TAAATTTTTTTTG W A<ATCATA«AAA^CCaACAC 

ArrTTTATTTCTTCGCACGTaGGGCCTCTCATWCGTCAMTCATCCTCATCGTCAAAAAGTTTTGGAGTAT^ 

TTTATGAAATTAATTTTCC^GCTTTTGCTTTTTGXKGGTTTCCCCTATTGTTTGrCAAGAGTTTCW ^ 

rGATGACCACGATGCAAGAAAGATCCGAAGAAGCTrTCGGTrrGAGGCTCAGTGGAAGGTGACTAGAAGTrGATAATTTG^ U1 „ 
MCmTTTGCCTTAAATG»a<^TACAmCCmATA«^ 

« T «CC<rr<^CCTaGACACATGCACaC^ 10 3<K 

CGGGTGT7GGCGG<TCTCGGGGCTG<KTTAACTATGCGGCATCAGAGCAGATTGTAC76A6AGT6CACCATATGCGGTG1C*W 1048e 

AAGGAGAAAATACCGCATCACKGOCCTTAAGGGCCTCCTGATACCCCTATTT^ iesw 
CACTmCGGGMAATOTGCGCGGAACCCaATTTGmATTTrTCTAAATACATTCAAATATGTATCCOTCATGAW 

CAATAATATTGAAAAAGGAAGAGTArGAGTATTCAACATTTCCGTGTCSCCCTrATTCCCTrTTTTGCGGCATT^ W7W 

AAACGCTGGTGAAAGTAAAAGATGCTGAAOATCAGTTGGGTGCACGAGTCGCrrACATCGAACTGGATCTCAACAKGCM 1080t 

CCCCGAAGAACGTTTTtCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTATTGACCCCGGWAAGAGCMCT 10EXW 

ATACACTATTCTCAGAATCACTTGGTTGAGTACTCACCACrCACAGAAAAGCATCnACGGATGGCATGACAGTAAGAGMTTATCCAG^ 11«>C 

CCATGAGTWTAACACTCCfiGCCAACTTACTTCTGACAACGATCCGAGGACCGAAGGACCTAACCGCTrrTTTCCACAAC^ HUK 
CCTTGATC6TTGGGAACCGGAGCTGAA ^^^KfA^ACCAAACGACGAGCGTGACACCACGATGCCTGTAGCAATGGCAACAACGTTGCGCAAACTATTA U2W 

ACTGCCGAACTACTTACTCTAiJCTTCCCMCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTCCAGGACCACTTCT UMJ 

GCTGGTT7ATTGCTGATAMTCTGGAGCCGGTGACCGTGG6TCTCCCGGTATCATTGCAGCACTGu6GCCAGAltiGTAAGCCCTCCCGT^ U4« 

CTACACCACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAK luee 

TAOCATATATACrrrAGATTGATTrAAAACTTCATTTITAATrrAAAAGGATCTAGCTGAAGATCCTTTTr U6 « 

GTGAGTTTrCGTTCCACTGAflCvTCAGACCCCCTAGA^AACATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGC^ U7W 

AAAACCACCCCTACCAGCGGTGGTTTGTrTGCCGGATCAAGAGO'ACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCG U8M 

GTCCTTCTAGTCT AGCCGrAflTlAGGCCACCACrrCA>GAACTCTCTAGCACCGCCTAC*T<Cn"CGCTCTGCTAATCCT 1194* 

GT6CC6ATAACTCGTOTCrrACCGGGTTGGACrCAA6ACGATAGTrACCGGATAAGGC&M 12wt 



BNSDOCID: <WO 982481 0A2_L> 
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OonOtrcag. 27 iMvembar 1«7 16-46 

Bb 35 oHPS M ap n > 12641 ) site and Swnoa P»0» 8 

mGGAGC<^C<*CCT«ACCGWCTGA<aTA^^ ^ 

CTcarrm CT «T( i ac<nac« ; GG 0 ccwcCT*T<^*cKa(;cMc««c(T^ 

CATCTTCTTTCCTCCCTTATCCCCT<UTTCrCT<;WTAACCGT*TTACC6CCTTTGACTC*CCTGATACC(XTCK 1240C 
«GTCAGT<UGC6AGGAAGCGGAAGAGCGCC<^TAC^ 

ACreGAAAGCCGGCAGTGAGCGCMCGCAATTAATCTGAGTTAGCTCACT^ 12668 
TGGAATTGTGAGCGGATAACAATTTCACACA66AAACAGCT 12641 
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Figure 36: Aecosiation of C. elegans UNC-53 (expressed 
from pTB72) with the microtubular cytoskeleton of HepG2 
cells. (A) microtubules stained with YL1/2 antibody to 
tubulin and iB) Staining for C. elegans UNC-53. 



BNSDOCID: <WO 982481 OA2_l_> 



# 

WO 98/24810 



45/270 



PCT/EP97/06956 



Cos MCF7 HepG2 




a b c 

Fgiure 37: Microtubule (+)-end binding of C elegans UNC-53 
following traneint transfection with pTB72 of HepG2 (a), MCF7 
(b) and Cos cells (c). C. elegans UNC-53 was visualised by 
immunofluorescence using mab 16-48. 
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WO 98/24810 



46/270 



PCT/EP97/06956 
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WO 98/24810 



51/270 



PCT/EP97/06956 



Tuesday. 18 November 1997 1 1 48 

f'g 34 P LM4 ( 1 > 10070) S it« ^ c - nrn 



GTATCTG4AAGACTTCTTTTTCTTr«r.4 ' ' . b AGCCC ^ r GCC TC AGAAACC actrrr . , . .- 



Pagel 



AACTTCGG A TC AAGAG* 
TTACGGAGTCTTTGGrGTGGGTTTrTTrtA,,,.- 




CACCCGTGA 



T . E T A « P 3 I K S S T 

■-■ACAGTGGCTCCCGGGACGAr,Tr;^r,^^ firTrrn :; 



<- S « V G T 



AAATGAGGAGGAGGAGCCAGAGAA,.A^r..,.-:. T , T ,--. 



AGCTGCGCT 
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fig 34 pLM4 f 1 > 10070) Site and Sequence 



Page< 



GCAGTTGGAGGTGGACCTGCTGAAAGCAGAGAATGACCGACTGAAGGTAGCCCCAGGCCCCT CATCAGGCrCCACTCCAGGGCAGGTCCCTGGATCA7C7 
CGTCAACCTCCACCTGGACGACTTTCGTCTCTTACTGGC j"GACTTCCATCGGGGTCCGGGGAGTAGTCCGAGGTGAGGTCCCGTCCAGOGACCT AGTAGA 




-orf pirn 



0 L E V D L L K A E N D R L K V A P G P S S G S T P G Q V 3 G S o 

GCATTATCTTCCCCACGCCGCTCCCTAGGCCTGGCAC TCAC CCATTCCTTCGGCCCCAGTCTTGCAGACACAGACCTG TC ACCCATGGATGGCATC AGTa 

' ' ' ■ i . ■ i i j i ■ i i i i iii i i . | . . t 

CGTAATAGAAGGGGTGCGGCGAGGGATCCGGACCGTGAG TGGGTAAGGAAGCCGGGGTCAGAACGTCTG tg TC TGGACAGTGGGTACCTACCGTAG tcat 



-insert pLMI 



-ORF pLM! 



A 1 S S p ft R S L G L A L T H S FGPSLAOTOLSPMDGIS 



CTTGTGG 



tccaaaggaggaagtgaccctccgggtggtggtgaggatgcccccgcagcacatcatcaaaggggacttgaagcagc 



aggaattcttcctggg 



gaacaccaggtttcctccttcactgggaggcccaccaccactcctacgggggcgtcgtgtagtagtttcccctgaa cttcgtcgtccttaagaaggacc 

-insert pLM1 



+ 5c :x 



-ORF pLMt 



T C G P K E £ , v T I * V V V 8 M P P Q H I IKGOLKQQEFFLG 

ctgtagcaaggtcagtggaaaagttgactggaagatgctgga^ 

gacatcgttccagtcaccttttcaactgaccttctacgacctacttcgacaaaaggttcacaagttcctgata taaagattttacctgggtcg gagatgg " >,W * 

-insert pLMI 



-ORF pL.M1 



C , S K v S G KVOWKMLDEAVFQVFKOY I S K M 0 => A S T 

ctuggactaagcactgagtccatccatggctacagcatcagccacgtgaaacgagtgttggatgcagagccccc cgagatgcctccttgccgtcga 

GaCCC TGAT TCGTGAC rCAGGTAGGTACCGATGTCGTAGTCGGTGCACTTToCTCACAACC T ACGTC TCGGGGGGC TCTACGGAGGAACG5C AGCTCCAi: ^ 



-insert pLMI 



L G L 



-ORF pLM1 



T E 3 ' H GYS I SHVKRVLDAEPPEMPPCRR 



TCAATAACATATCAGTCTCCCTCAAAGGTCTGAAGGAGAAATGCGTCGACAGCCTGGTGTTCGAGACGCrGATCCCCAAGCCGAT GATGCAGCACTACAT 
AGTTATTGTATAGTCAGAGGGAGTTTCCAGACTTCCTCTTTACGCAGC TGTCGGACCACAAGCTCTGCGAC TAGGGGTTCGGCTACTACGTCGTGATCTA ^ 



-insert pLMI 



" ~ ORF pLM1 ZZ 

V w . N ' S V S L KGLKEKCV03LVFETLIPKPMM3HYI 
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Pagef 



AA^CC TCCTGCTGAAGCACCGGCGCCTCGTCC TCTCGGGCCCCAG^GGCACGGGC AAGACC TACCTGACCAATCGCTT GGCCGAG TACC TGG TGGAGCG: 

IGGAGGACGACTTCGTGGCCGCGGAGCAGGAGAGCCCGGGGTCGCCGTGCCCGTTC TGGATGGACTGGTTAGCGAACCGGC TC ATGGACC ACCTCGCl- ^ 




-ORF pLMl 



L L , L K H , R R L v L S G P S G T G K T YLTNRLAEYLVER 



TC rGGCCGTGAGGTCACAGAGGGCATCGTCAGCACCTTCAACATGCACCAGCAGTCTTGCAAGGATCTGCAAC TGTATCTTTC CAACCTAGCCAACCAGA 
AGACCGGCACTCCAGTGTC TCCCGTAGC AGTCGTGGAAGTTGTACG TGGTCGTCAGAA CGTTCCTAGACGTTGACATAGAAAGGTTGGATCGGTTf;?;T«"~ 

-insert pLM1 




Z G 



U 1 V O I 



r n n ri u Q SCKDLQLYLSNLANO 



TAoACCGGGAAACAGGAATTGGGGATGTGCCCCTGGTGA TTCTATTGGATGACCTGAGTGAAGCAGGCTCCATCAGTGA GTTGGTCAATGGGGCCCTC AC 
ATCTGGCCCTTTGTCCTTAACCCCTACACGGGGACCACTAAGATAACCTACTGGACTCACTTCGTCCGAGGTAGTCACTCAACCAGTTACCCCGGGAGn 




•ORF pLMl 



1 °. R E T G G 0 V . P L V ' L L 0 D L 5 E A G S t S £ L V N G A L T 

CTGCAAGTATCATAAATGTCCCTATATTATAGGTACCACCAATCAGCCTGTAAAAATGACACCCAACCATGGCT 

GACGTTCATAGTATTTACAGGGATATAATATCCATGGTGGTTAGTCGGACATTTTTACTGTGGGTTGGTACCGAACGTGAAC TCGAAGTCCTACAACTGS 



57CC 



"ORF pLMl 



K Y H K C P Y I IGTTNQPVKM 



TPNHGLHLSFRMLT 



.gag: 



■ -CTCCAACAACGTGGAGCCAGCCAATGGCTTCCTGGTTCGTTACCTGAGGAGGAAGCTGGTAGAGTCAGACAGCGACAT CAATGCCAACAAGGA, 
AA3AGGTTGTTGCACCTCGGTCGGTTACCGAAGGACCAAGCAATGGACTCCTCCTTCGAC CA TCTC AGTCTGTCGC TGTAGTTACGGTTGTTCCTTCT*"" 

■insert pLMl 



™ ORFpLMI 

= S W N V E P A " G , F l V R * L ft R K L V E S 0 S D I N A N K E E 

TGCTTCGGGTGCTCGACTGGGTACCCAAGCTGTGGTATC ATC TCCACACCTTCCTTGAGAAGCACAGCACCTCAGACTTCCTCATCGGCCC ttgcttctt 
AC 3 AAGCCC ACG AGCTG AC CCATGGGTTCG AC AC CAT AG TAGAGGTGTGGAAGGAACTCTTCGTGTCGTGGAG TCTGAAGGAGTAGCCGGGAACGAAGAA 



-ORF pLMl 



L L R V L D V V p K L VYHLHTFLEKHSTSDFLIGP 



C F F 
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fig 34 pLM4 (1 > 10070) Site and Sequenca 



Page TO 



TCTGTCGTGTCC CATTGGCATTGAGGACTTCCGGACCTGGTTCATTGACC TGTGGAACAACTCTATCATTCCCTATCrAC AGG AAGGA GCC AASGA TCG'- 
AG ACAGC ACAGGGfAACCGTAAC TCC TGAAGGCCTGGACCAAGTAACTGGACACCTTGTTGAGATAGTAAGGGATAGATG TCC TTCCTCGGTTrrT^rr^ ^ 




L S C P IGEEDFRT 



WFIDLWNNSIIP 



YLQE GAKOG 



ATAAAGGTCCATGGAC AGAAAGCTGCTTGGGAGGACCCAGTGGAATGGGTCCGGGACACACTTCCCTGGCCATCAGCCCAACAAGArrAATr aa *>^/»T.-r" 



^^CAGGTACCTGTCtrTCGA^ 

• insert pLM1 



olGv 



i K V H G Q K A 



-ORF pLM! 



AWEDPVEWVRDTLPWp 



SAQQOOSKL 



A CCACCTGCCCCCACCCACCGTGGGCCCTCACA6CATTGCCTCACCTCCCGAGGATAGGACAGTCAAAGACA6CACCCCAA GTTCTCTGGACTCAGATC- 
TGGTGGACGGGGGTGGGTGGCACCCGGGAGTGTC GTAACGGAGTGGAGGGCTCCTATCCTGTCAGTTT CTGTCGTGGGGTTCAAGAGACt'TnflnTrTAf:'-. * 



VHLPPPTVGPHS 1 



-ORF pLM1 



A S P P E 0 R T V K 0 S T P S S L 0 S D P 



TCTGATGGCCATGCTGCTGAAACTTCAAGAAG CTGCCAACTACATTGAGTCTCCAGATCGAGAAACCATCCTGGACCCCAACC TTCAG6CAACACTTTAA 
A 'JACTACCGGTACGACGACTTTGAAGTTCTTCGA CGGTTGATGTAACTCAGAGGTCTAGC TC TTTGGTAGGACCTGGGGTTGGAA GTcrnTTnTftA aatt 



•insert pLM1 



LMAMLLKLOE 



-ORF pLM1 



A A N Y [ESPDRET! 



LDPNLQATL 



GG-orTCGGCAATCACTGrCACCCCCGGACAGCAGA ACGCTGGCATCAGCTArCTrAGCTCCTCCTCTCCCCTCrCCT CTTTCAGAGCACTr.,rr.-T f y,-: 
CvCAAGCCGTTAGTGACAGTGGGGGCCTGTCGTCTTGCGACCGTAGTCGArAGAATCGAGGAGGAGAGGGGAGAGGAGA^GTCTCGTGr^^t 



-insert pLMl 



- G ' G " C H P R T A E » * " o ^ * . L L L s P L L F Q S T G s P 

CCCCAGGAGGAGAACAGGAGGGAGGAGGAGArGAAA G AGGAGGGACAGGTTCTTGGTGCTGTACCTTTGAGAAC TTCCTAGGAAGGAATG G rGG n(;T ,,- 
G'juGTCCTCCTCTTGTCCTCCCTCCTCCTCTACTTTCTCCTCCCTGTCCAAGAACCACGACATGGAAAC TCTTGAAGGATCCTTCCTTACC ACCCCACC-:' 



* p GGEQEG 



■insert pLMl 



SG DERGGTGS VCCTFENFLGRN 



G G V 



GTTTGGGAACTTGTGCCCCCTAAACACATTTACTGGCC TCCTCTAATGACrTTGGGGAAAAGATGATTCTGGGTC TTTCCCTTGACTTCTTGTrTf A^T7 
C. AAACCCTTGAACACGGGGGArTTGTGTAAATGACCGG A GGAGATTAC TGAAACCCCT TTTC TAC TAAGACCCAfiAAArirrtA A/*rr ' . 
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fig 34 pLM4 (1 > 10070) Site and Sequence 



Page 0 



ACAAACTCCTGGGCTTTCTGGGGAGGGGTTCAGAAAACATCAAAACACTGCAGCAGTTCCCCGGAATTCAGCTTG GAC T TAACC AGGC TGAAC r TGC TC A 
TGTrTGAGGACCCGAAAGACCCCTCCCCAAGTCTTTTGTAGTTTTGTGACGTCGTCAACGGGCC^AAGTCGAACCTGAATT * ? 



-insert pLMl 



TNSVAFVGGVQKTSKHCSSSPE 



FSLDL TRLNLL 



AAAGAAGCCGAATTCCAGCACACTGGCGGCCGTTACTAGTTCTAGATAACTGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGC 



TTTAAAA 



TTTCTTCGGCTTAAGGTCGTGTGACCGCCGGC AATGATCAAGATCTATTGACTAGTATTAGTCGGTATGGTGTAAACATCTCCAAAATGAACGAAATTT7 

> , 

insert pLM1 ' 

K R S R 1 , P A , H V * ,P L L V L 0 N . S . S A I P H L R F Y L L . (> 

AACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCAGCTTATAAT GGTTACAAATAAACCAATA 
TrGGAGGGTGTGGAGGGGGACTTGGACTTTGTATTTTACTTACGTTAACAACAACAATTGAACAAATAACGTCGAATATTACCA 

T , • N I K . M Q L L L L TCLLQL I M V T N K A I 



T S H T S P 



GCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATG^ 
CGTAGTGTTTAAAGTGTTTATTTCGTAAAAAAAGTGACGTAAGATCAACACCAAACAGGTTTGAGTAGTTACATAGAATTGCGC 

1 II on 11 

A S Q J S Q 1 * " ^ F H C I L V V V C P N S S M Y L N A I V S V 

ATATTTTGTTAAA ATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAAA^ 
TATAAAACAATTTTAAGCGCAATTTAAAAACAATTTAGTCGAGTAAAAAAT^ 



59iX 



:'J0C 



no*: 



NILLKFALNFC 



I S S F F N Q 



AEIGKIPYKSKE 



CGAGATAGGGTT GAGTGTTGTTCCAGTTTGGAAC AAGAGTCC ACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGA 

GCTCTATCCCAACTCACAACAAGGTCAAACCTTGTTC TCAGGTGATAATTTCTTGCACCTGAGGTTGCAGrTTCCCGCTTTTTGGCAGATAGTCCCGCTl 



■2a 



E IGLSVVPVWNKSP 



LLKNVDSNVKGRK TVYQGG 



GGCCC AC TACGT GAACCATCACCCTAATCAAGTTTTTTGGGGTCGAGG TGCCGTAAAGCACT AAAfCGGAACCCTAAAGGGAGCCCCCGATTTAGAGCTT 
<*SSGTGATGCACTTGGTACTCCGATTAGTTCAAA 



GPLREPSP 



SSFLGSRCRKA 



LNRNPKGSPRFRA 



GACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGC TGGC AAGTGTAGCGGTCACGC TGCGCGTAAC 
CTGCCCCTTTCGGCCGCTTGCACCGCTCTT^ 



RGKPAN V AR KEGKKA K G A G A R A L A S V 



A V T L R V T 



CACCACACCCGCCGCGCTTAA 



TGCGCCGCTACAGGGCGCGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCC 



TATTTGTTTATT7TTCTAAATA 



GrGGTGTGGGCGGCGCGAATTACGC G GCGATGTCCCGCG CAGTCCACCGTGAAAAGCCCCTTTACACGCGCCTTGGGGATAAACAAATAAAAAGATTTA; 



— n on 

T T P A A L N 



APLQGASGG TFRGNVR 



G T P [ C L F 



BNSDOCID: <WO 982481 0A2_I_> 
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fig 34 pLM4 (1 > 10070) Site and Sequence Pa 9 e ^ 

CATTCAAATATGTATCCGC ^ATGAGAC AATAACCCTGATAAATGC TTCAATAATATTGAAAAAGGAAGAG TCCTGAGGCGGAAAGAArra 



GCTGTGGAA 



GTAAGTTTATACATAGGCGAGTACTCTGTTATTGGGACTATTTACGAAGTTATTATAACTTTTTCCTTCTCAGGACTCCGCCrTTCTTGGICGACCCT"' ^ 
HSN MYPL.MRQ . p . M L 0 . Y , K r K s PEAERTSC^G 



TGTGTGTCAGTTAGGGTGTGGAAAGTCCCCAGG CTCCCCAGCAGGCAGAAGTATGCAAAGCAT6CATCTCAATTAGTCAGCA ACCAGGTGTGGAAAr.Tr- 

acacacagtcaatcccacacctttcaggggtccgaggggtcgtccgtcttcatacgtttcgtacgtagagttaatcagtcgttggtccacacctttcagJ 77C ' : 



" C y S ■ G tf, E S P Q A P Q Q AEVCKACISIS 



QQPGVESP 



CCAGGCTCCCCAGCAGGCA G AAGTATGCAAAGCA TGCATCTCAATTAGTCAGCAACCATA G TCCCGCCCCTAACTCCG CCCA T CCCGCrcrT flA r Tr ,..- 
GGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAATCAGTCGTTGGTAKAGGGCGGGLT Tr.^^;.^^.^^ ' ' 7 * 



Q A P Q Q A EVCKACISISQQP.SRP 



TCGTTGGTATCAGGGCGGGGATTGAGGCGGGTAGGGCGGGGATTGAGGCC- ° 
. LRPSRP.tP. 



CCAGTTCCGCCCATTCTCCGCCCCAT6GCTGA CTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTA TTCCAGAAGTAGTGAG" 

GGTCAAGGCGGGTAAGAGGCGGGGTACCGACTGATTAAAAAAAATAAATACGTCTCCGGCTCCGGCGGAGCCGGAGACTCGATAAGGTciTCATCACTrl ^ 
P V P P I L R P H A D . F F L F M Q R p R p p R p L s y s R s g £ " 

AGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAGAT CGATCAAGAGACAGGATGAGGATCGTTTCGCATGATTGAACAAGATG GATTGCACGCAGnTTrTrr 

tccgaaaaaacctccggatccgaaaacgtttctagctagttctctgtccIactcctagc^agcgtac t^cttgttct^ctaacgtgcgtccaagagg a ° CC 



-KarVNeo 



' A ' V E A • A F A K ' ° Q E T G • °S 7 R H , E Q D g L H A G s p 

GGCCGCTTGGGTGGAGAGGCTATTCGGC1 



ITATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGCCGTGTTCCGGCTGTCAGCGC 



AGGGGCGCCCG 



ccggcgaacccacctctccgataagccgatactgacccgtgttgtctgttagccgacgagactacggcggcaca;..^.,,.^.^^^ 1 



-Kan*Neo 



A A y y E R L F GYDVAQGTfGCSD 



AAVFRLSAQGRP 



GTTCTTTTTGTCAAGACCGACCTGTCCGGTGCCCT GAATGAACTGCAAGACGAGGCAGCGCGGCTATCGTGGCT GGCCACGACGGGCGTTCCTTncnrt^ 
CAAGAAAAACAGTTCTGGCTGGACAGGC CACGGGACTTACTTGACGTTCTGCTCCGTCGCGCCGATAGC ACCGACCGG TGCTGCCCnCAAnnAArfirnT- 



-kan/Neo 



V L F V K T 0 L SGALNELQOEAAR 



LSWLATTGVPC 



CTGTGCTCGACGTTGTCACTGAAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCCGGGCAGGATCTCCTGTCATCTCACCTTGCTrrrr.rrr., 



GACACGAGCTGCAACAGTGACTTCGCCCTTCCCTGACCGACGATAACCCGCTTC ACGGCCCCGTCCTAnAnnArir.r 

min/Neo 



,GAA 

AGAGTGGAACGAGGACGGCTCTT J ~"" 



AVLOVVTEAGRO 



V L L L G E V P G QDLLSSHLAPAEt 



AGTATCCATCATGGCTGATGCAATGCGGCGGC TGCAT ACGCTTGATCCGGC TACCTGCCCATTCGACCACCAAG CGAAACATCGCATCGAGCGAGCACGT 
TCATAGGTAGTACCGACTACGTTACGCCGCCGACGTATGCGAACTAGGCCGATGGACGGGTAAGCTGGTGGTTCGCTTTGTAGCGTAGCTCGCTrr.Tr.rl *"* 



VSIMADA MRR 



1- H T L D P A TCPFDHQAKHRIER 



ACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGAT CTGGACGAAGAGCATCAGGGGCTCGCGCCAGCCGA ACTGTTCGCCAGGCTCAAGGCr.Ar.rAT,- 
TGAGCCTACCTTCGGCCAGAACAGC TAGTCCTACTAGACCTGCTTCTCGTAGTCCCCCAGCGCGGTCfir.ri 



1 » H E A G L y Q Q Q Q L Q £ E „ Q G L A p AELFARLKASM 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 



57/270 
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' ^^^ A ^^^^^^ A ^^^CGTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAAATGGCCGC TTTTCTGGATTCATCGAC TGTGGCC 



| I [ | | . ■ ■ I . r ) | , | ' ,' ~ * ~ ' ' """"" * ** w «-»- • - 1 ' GGC " 

GGCTGCCGCTCCTAGAGCAGCACTGGGTACCGCTACGGACGAACGGCTTATAGTACCACCTTTTACCGGCGAAA^GACCTAAGTAGCTG/:CACCGG C r.-.l 36 ° 



-KarWNeo 



P D G E D L V V T H GDACLPNtMVENGRFSGF 



I 0 C G R L 



GGGTGTGGCGGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGGGCTG ACCGCTTCCTCGTGC 
CCCACACCGCCTGGCGATAGTCCTGTATCGCAACCGATGGGCACTATAACGACTTCTCGAACCGCCGCTTACCCGACTGGCGAAGGAGCA~r; ' " 



TTTACGG" 
ACGAAATGCC* 



G V A 0 R Y Q D I A L A T R D IAEELGGEVAORFLVLY 



G 



ATCGCCGCTCCCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGACGAGTTCTTCTGAGCGGGACTCTGGGGTTCGAAATGACCGACCAAGCGA CGCC 

tagcggcgagggctaagcgtcgcgtagcggaagatagcggaagaac tgcIcaagaagacIcgccctgagaccccaagctItactggctggttcgctgcg; M& " 



KaruNoo 

' A A , p DSQRIAFYRLLOEFF 



AGLWGSK .PTKRR 



CAACC 



TGCCATCACGAGATTTCGATTCCACCGCCGCCTTCTATGAAAGGTTGGGCTTCGG 



■+- 



aatcgttttccgggacgccggctggatgatcctccagcg: 



gttggacggtagtgctctaaagctaaggtggcggcggaagatactttccaacccgaagccttagcaaaaggccctgcggccgacctactaggaggtcgcg 390C 

A G . S S S A 



P T C H H E 1 S I PPPPSMK GWASESFSG TP 



GGGGATCTCATGC TGGAGTTCTTCGCCCACCCTAGGGGGAGGCTAACTGAAACACGGAAGGAGACAATACCGGAAGGAACCCGCGCTATGACGGCAAT AA 
CCCCrAGAGTACGACCTCAAGAAGCGGGTGGGATCCCCCrCCGATTGAciTTGTGCCTTCCTCTGTTATGGCCTTCCTTGGGCGCGATACTGCCGTTATT *** 



G1S . C^S SSPTLGG G.LKHGRRQYRKEP 



A L . R Q 



AAAGACAGAATAA AACGCACGGTGTTGGGTCGTT7GTTCATAAACGCGGGGTTCGGTCCCAGGGCTGGCACTCTGTCGATACCCCACCGAGACCCCATTG 
TTTCTGTCTTATTTTGCGTGCCACAACCCAGCAAACAAGTATTTGCGCCCCAAGCCAGGGTCCCGACCGTGAGACAGC TATGGGGTGGC TCTGGGGTAAC ^ 
K ° R ' K » T V L G R >- F 1 N A G F G P R A G T L S 1 P H R 0 P I 

GGGCCAATACGCCCG C6TTTCTTCCTT7TCCCCACCCCACCCCCCAAGTTCGGGTGAAGGCCCAGGGCTCGCAGCCAACGTCGGGGCGGCAGGCCCTGC: 

CCCGGTTATGCGGGCGCAAAGAAGGAAAAGGGGTGGGGTGGGGGGTTCAAGCCCACTTCCGGGTCCCGAGCGTCGGTTGCAGCCCCGCCGTCCGGGACG-' ^ 
GA M TP AF LPFPr*p TPQ V PVK AQGSQPTSGRQ ALP 

atagcctcaggtta ctcatatatactttagattsatttaaaacttcatttttaatttaaaaggatctaggtgaagatcctttttgata atctcatgacc A 
tatcggagtccaatgagta ^ a ^*^gaaatctaactaaattttgaagtaaaaattaaattttcctagatccacttctaggaaaaactattagagtactgg" K<X 



N f ifnlkgsr.rsfliis 



p 



AAATCCCTTAACGTGAGTTTTCGTTCCAC TGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCG TAATCTGCTu 

tttagggaattgcactcaaaagcaaggtgactcgcag tctggggcatcttttctagtttcctagaagaactctaggaaaaaaagacgcgcattagacgac ?4C ' ; 



« S L n V SFRSTEROTP. 



-pUCori . 



? R SKOLLE ilffca. sa 



cttgcaaacaaaaaaaccaccgctaccagcgg tggtttgtttgccggatcaagagctaccaactctttttccgaaggtaactggcttcagcaga gcgca: 

GAACGTTTGTTrTTTTGGTGGCGATGGTCGCCACCAAACAAACGGCCTAGTrCTCGATGGTTGAGAAAAAGGCrTCCATlGACCGAAr.Tr^TrT^.rr.T- 
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fig 35 pNPO Map (1 > t264i) Site and Sequence 

Enzym»s : aio46 ♦ntymes (No R*«i) 

CAUKttnAT«:ttCCC«GcmCTA>^ 

CT6TAT<^A<iTrCTTGA«TUTT™^^ 
AjTrAGATmACAAAACTTGOATAC^^ 
ACmCCACAAAT^AAUTCCAGGCTraGA^ 
CA^GAAWTWTAAAAATCTACTTTTmKAAA^ 

ACT^GGAGCGCMGTCTATACTWAMr^^ 

CACTAGTTTCAGAACCTTCCnTTTGTATGAAAAACrAAAAAAAAACTATTTCAAACCCTCACXGCCACCATGT^ 
^TTTACAAATCGCCTCC^^ 

TTCTCCArTTTGCTTATAAACATTTGTCTCTCGAAGMAACT 1288 
TWTGAATAAAGATAGAGCCGATGACACT(KCTGGTAGTAGTATGA6TCTAG^ 1380 

cMurrwcwTTAjneiis^ 

ATTTATTTTCWTATTTAATTrTGTCCAArrAGGGATAAACACGACTTTTA iSM 
TTCAAAAAATCAATAAATATTTCCTAAC AAATTGTAT G &CTAAAATTTTATTTCTACTGTTG AC M TATCTTTATATGTATCACTGTTTf CCATCTCAAA 1GBB 

ACCTTGAArCCCCCMGTTAlJMiGAAGCTCCGTGTCACATrrCCC^ 1788 
mTTGGOACGCGTAATAAAGTGC^^ 

TmCCATGCTTTn 5C CCCA7TmAAACTnaCACaCTTCATC«TaUaCGTATCATAAMAGTAT^ 
Aa^GATAOGAGCCACATGC^CT^ 

CnCTTrrcnCTTCGTTTTGACTCGCGCCTATTTTnGTGGCTGC 2188 
AAAAACGA66CAATTTAAAAATATTAAAA n AATGAGGTTOTAGATOTAGATnGC^AAAGAAGAAAAAAACAAAACAAATAGCAACCGCCAGATCAAAA 2288 

TTCTATTTAAASGTrnCAAGATGTTTAGCCAAGATTCGGCTGAACAGAAAACrG 2388 

ATfCTAAGCCTGAACTATA6CCTTATTCTAGATCnAGTTGCG£ATAAGCTCAAGCCCAAGCAGAAA 2*88 

(XTTGCTrUGTC^AATCCAGACrAGArTTCCAAGAUGrrTTCAATTTTAAATGTTTCCAGT^ 2588 

AAAATCCTTATCCCTTTCTCTCAWCTrTCAATTACAGArrCATCAAAGATTGCTAICAACCCAAAGACGT 2688 

ACTTCATCAMTAATACAAAnCAmCGTCC^^ 

TCCCXrCCrrCCKCTTCTTrrrAGAAATTATATTATT^ 2888 

JUkAAC07CT6GACCAOUU>CCCAGCTA<nTCGTGTTGCT*CAACTACAAAAATCGGAA4CTCA 2988 

(TTGCTTCTGTGAAGACTATTOykGCAAAACAAGAGCCCCATAACAGCGCT 3888 



WO 98/24810 



60/270 



PCT/EP97/06956 



tfondtrdap, 27 novenfeer 1997 16;46 

lifl gs pnpp Map n > i2Sdi| nnd UQlianca 2 

"■""■"^^ ^ 

CAM(U{nAWGcwTTTCTCAocTcncTTCKcc<rrr?TTCAAnc(rnTr^ ^ 

ACCCMACTrmAAATTTTATmuaATa^TT^MAACTTT^ 
AA*TTTTA(^ACATTCAC*TA0GA<rrTCCCTCCCTAAAAOTTTCT5CTCCCCGA0M<iCTTrC6A^ 

TmracaAacAArmTAucAr^^ 

C^ACCA^CTacCCTCATa^TOTACTCTrAACCCCKCAT 

TAT7ATCAACAACCCTCTTCAOTTCA6TA7Trr7TCTTTCTCCCTAGAWCTrClTCTCAA6Tn 4989 

TITCCCTrnTTA6ATTTlAAGAITTT5ArACACTTrAAT(rrCT6<^TTC6CCrTCCTAAW ^ 

*"««"«^^ 

AOTC ™™^'«™™A^^^ 

<GTAATTmc«TnmAAACmCT C ^ 

TAAGTniTTTJCTCAAACTTGTTTAAAAAATT<^(ITfAAAAAATAAAATAATACTCTATAAAAATT^ 5660 
TTTCCAAATATCTCTAATACTAA^TTTCTTTCTGAATTTACAAAACATATTTrAAMTACAT^ S709 
*AAMT(>Q AAAAACCGM ATTTAATTCCAAAAAAAATATTTTGAAAAOCaCAAA f AAA6CTACT7TTCAAAAAATCAACAAAAAAAATATCAAAACAT 5800 
TCATATTTTU(MMrAGAGACATAGTCAAAAACTAICAAAAAATrCACTATrrTCCG<UAAAAAATTGA<UAAAA^ 5900 
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fig 33 pEGFPxba (1 > 5447) Sits and Sequence 

. AAAAGATCAGAAGAAATTGGAGCAACTACCCACATCCATTATGCCACCCGCGGTTTCTAAATTACCCTCGCCACGTGTCGCCACGTCAGCAACCGCTTCA 

, , . 1 , , 1 -i ■ ■ I . , — ! , : , h 

TTTTC TAGTCTTCTTTAACCTCGTTGATGGGTGTAGGTAATACGGTGGGCGCCAAAGATTTAATGGGAGCGGTGCACAGCGGTGCAGTCGTTGGCGAAG" 



- eG FPC . e. unc53xba 



-C.e.unc53 xba 



KDQKKLEOLPTS IMPPAVSKLPSPRVATSATAo 
GCAAC TAACCCAAATTCCAACTTTCCACAAATGTCAACATCCAGGCTTCAGACTCCACAGTCAAGAATATCGAAAATTGATTCATCAAAGATTGGTATCA 

i 1 < 1 « 1 ■ • ' 1 ' 1 ■ 1 ■ 1 ■ 1 1 1- i9o: 

CGTTGATTGGGTTTAAGGTTGAAAGGTGTTTACAGTTGTAGGTCCGAAGTCTGAGGTGTCAGTTCTTATAGCTTTTAACTAAGTAGTTTCTAACCATAG" 



-eGFPC.e,unc53xba 



-C.e.unc53 xba 



ATIMPNSNFPQMSTSRLQTPQSR I SK I DSSK IG I 

AGCCAAAGACGTCTGGACTTAAACCACCCTCATCATCAACCACTTCATCAAATAATACAAATTCATTCCGTCCGTCGAGCCGTTCGAGTGGCAATAATAA 

- i ■ ■ i ( ■ ■ ■ 1 1. j — i ■ .. t i . ■ ■ . t . ■ i ■ j ■ . ■ ■ i ( i ■ ■ i i i ■ i ■ y ■ ... ■ i ... 1. 1. | ... .,4-. . i ■ i i ■ ) 'CO'' 

TCGGTTTCTGCAGACCTGAATTTGGTGGGAGTAGTAGTTGGTGAAGTAGTTTATTATGTTTAAGTAAGGCAGGCAGCTCGGCAAGCTCACCGTTATTAT7 



-eGFPC.e.unc53xba 



-C.e.unc53 xba 



KPKTSGLKPPSSSTTSSNNTNSFRP3SRSSGNNM 

TGTTGGCTCGACGATATCCACATCTGCGAAGAGCTTAGAATCATCATCAACGTACAGCTCTATTTCGAATCTAAACCGACCTACCTCCCAACTCCAAAAA 

i ' 1 ' i " 1 1 1 ' 1 • 1 > 1 i ■ 1- 2 ICC 

ACAACCGAGCTGCTATAGGTGTAGACGCTTCTCGAATCTTAGTAGTAGTTGCATGTCGAGATAAAGCTTAGATTTGGCTGGATGGAGGGTTGAGGTTTT T 



- eGFPC.e.unc53xba 



-C,e.unc53 xba 



V G S T ISTSAKSLESSSTYSS 1SNLNRPTSQLQK 

1 - - ■ - i , • > . n ■ i ■ ■ ■ ■ • m i ■ . 

CCTTGGGATCCACCGGATCTAGATAACTGATCATAATCAGCCATACCACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAA 
■ ' < ■ 1 1 — i ■ l 1 — ( 1 1 1 1 1 i , i i t 

GGAACCC TAGGTGGCC TAGATCTATTGACTAGTATTAGTCGGTATGGTGTAAACATCTCCAAAATGAACGAAATTTTTTGGAGGGTGTGGAGGGGGACT T 



-eGFPC.e.unc53xba ' 



3 

3 * 0 P P OLON . S . SA IPHL R F Y L L .KTSHTSP . 

CCTG AAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGC AGCTTATAATGGTTACAAATAAAGCAATAGCATC ACAAATTTC ACAAATAAA 

1 i ' -4- I 1 ■ . , , 1 : , , , 1 ■ 1 ■ 1 1 1 1 h 'Oa' 

G»jACTTTGTATTTTACTTACGTTAACAACAACAATTGAACAAATAACGTCGAATATTACCAATGTTTATTTCGTTATCGTAGTGTTTAAAGTGTTTATT7 
MQLLLL TCLLQL ! MVTNKA lASQ I S Q I > 

GCATTTTTTTCACTGCATTC TAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTAACGCGTAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTTA 
CGrAAAAAAAGTGACGTAAGATCAACACCAAACAGGTTTGAGTAGTTACATAGAATTGCGCATTTAACATTCGCAATTATAAAACAATTTTAA5CGCAA T 
HF FHCILVVVCPNSSMYLNA. IVSVM I LL^FAL 



BNSDOCID: <WO 9824810A2_I_> 



■hi 
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fig 33 pEGFPxba (1 > 5447) Site and Sequence 

AATTTTTGTTAAATCAGCTCATTTTT^ 

TTAAAAACAATTTAGTCGAGTAAAAAATTGGTTATCCGGCTTTAGCCGTTTTAGGGAATATTTAGTTTTCTTATCTGGCTCTATCCCAACTCACAACAAc 

H F C .• 1 S S F F N Q.AEIGKIPYKSKE.TEIGLSVV 
' 1 — ** ' ' 1 ' ' 1 — i — ■ 

CAGTTTGGAACAAGASTCCACTATTAAAGAACGTGGACTCCAACGTCAAAGGGCGAAAAACCG TCTATCAGGGCGATGGCCCACTACGTGAACCATCACC 
GTCAAACCTTGTTC TCAGGTGATAATTTCTTGCACCTGAGGTTGCAGTTTCCCGCTTTTTGGCAGATAG TCCCGCTACCGGGTGATGCACTTGG TAGTGG ^ 
P V V N K S P I L K N V D S N V K G R KTVYQGDGPLREPSP 

CTAATCAAGTTTTTTGGGGTCGAGGTGCCGTAAAGCACTAAATCGGAACCC^ 

GATTAGTTCAAAAAACCCCAGCTCCACGGCATTTCGTGATTTAGCCTTGGGATTTCCCTCGGGGGCTAAATCTCGAACTGCCCCTTTCGGCCGCTTGCAC 
■ S S F L G S R C R K A L N P N P K G S P RFRA . RGKPANV 

GCGAGAAAGGAAGGGA AGAAAGCGAAAGGAGCGGGCGCTAGGGCGC TGGCAAGTGTA GCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATG 

CGCTCTTTCCTTCCCTTCTTTCGCTTTCCTCGCCCGCGATCCCGCGACCGTTCACATCGCCAGTGCGACGCGCATTGGTGGTGTGGGCGGCGCGAATTAC 
A R K £ G K KAKGAGARALASVAVTLRVTTTPAALN 

— .1.1 i.^.— t ■ I 1 , > | , , . t , t , 

CGCCGCTACAGGGCGCGTC AGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTT C TAAATACATTCAAA TATGTATCCGCTCAT 
GCGGCGATGTCCCGCGCAGTCCACCGTGAAAAGCCCCTTTACACGCGCCTTGGGGATAAACAAATAAAAAGATTTATGTAAGTTTATACATAGGCGAGTA " 9<X 
A P L Q G A S G G T F R G N V R G T P t C L F F . IHSNMYPLM 

GAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGG 

CTC TG TTATTGGGACTATT TACGAAGTTATTATAACTTTTTCCTTCTC AGGACTCCGCCTTTCTTGGTCGACACCTTACACACAG TCAATCCCACACCT7 

R , Q •, P ■ • M L Q . Y . KRKSPEAERTSCGMCVS . GVE 

' 1 1 iiit, ^ . i . . . . . , t 

AGTCCCCAGGCTCCCCAGCAGGCAGAAG TATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAAAGTCC CCAGGCTCCCCAGCAGGCAGAAG 
TCAGGGGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAATCAGTCGTTGGTCCACACCTTTCAGGGGTCCGAGGGGTCGTCCGTCTTC 
S P Q A P Q Q A E V C K A C 1 S i S Q Q P G VESPQAPQQAE 

TATGCAAAGCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCC 

ATACGTTTCGTACGTAGAGTTAATCAGTCGTTGGTATCAGGGCGGGGATTGAGGCGGGTAGGGCGGGGATTGAGGCGGGTCAAGGCGGG taagaggcggg 
V C K A C 1 S , 1 SQQP . SRP .LRPSRP.LRPVPPILRP 



CATGGCTGACTAATTTrTTrTATTTATGCAGAGGCCGAGGCCGCCTCGGCCTCTGAGCTATTCCAGAAGTAGTGAGGAGGC rTTT TTGG AGGCC TAGGC 7 
GTACCGACTGATTAAAAAAAATAAATACGTCTCCGGCTCCGGCGGAGCCGGAGACTCGATAAGGTCTTCATCACTCCTCCGAAAAAACCTCCGGATCCGA ^ 
" A D . ■ F F > F " Q » P ft P P P P L S Y SRSSEEAFLEA.A 

TTTGCAAAGATCGATCAAGAGACAGGATGAGGATCGTTTCGCATGATTGAACAAGATGGATTGCACGCAGGTTCTCCGGCCGCTT 

AAACGTTTCTAGCTAGTTCrCTGTCCTACTCCTAGCAAAGCGTACTAACTrGTTCTACCTAACGTGCGTCCAAGAGGCCGGCGAACCCACCTCTCCGATA 
F A K .' 0 Q E T G . G S F R M I E Q D G L H A G SPAAVVERL 

T CSGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGCCGTGTTCCGGCTGTCAGCGCAGGGGC6CC CGGTTCTTT 

AGCCGATACTGACCCGTGTTGTCTGTTAGCCGACGAGACTACGGCGGCACAAGGCCGACAGTCGCGTCCCCGCGGGCCAAGAAAAACAGTTCTGGCTGGA ^ 
F G Y D V A Q Q T I G C S 0 A A V F R L S A Q GRPVLFVKTDL 

GTCCGGTGCCCTGAATGAACTGCAAGACGAGGCAGCGCGGCTATCGTGGC rGGCCACGACGGGCGTTCCTTG CGCAGCTGTGCTCGACGTTGTCAC TGAA 
CAGGCCACGGGACTTACTTGACGTTCTGCTCCGTCGCGCCGATAGCACCGACCGGTGCTGCCCGCAAGGAACGCGTCGACACGAGCTGCAACAGTGACTT ^ 
3 G A L N £ L Q O E A A RLSVLATTGVPCAAVLDVVTE 
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(ig 33 pEGFPxba (1 >5447) Site and Sequence 4 

' G '~" GG AAGSGACTG GCTGCTA ^^^^^^^^^^^^^^^^^^^^^^I^^^^^^^CCTTGCTCCTGCCGAGAAAGTATCCATCATGG CTGATj'*/'^ 

cgcccttccctgaccgacgataacccgcttcacggccccgtcctagaggacagtagagtggaacgaggacggctctttcataggtagtaccgac-'-gt} - 170 ' 

AGR DV/ LLLGE VPG QDLLSSHLAP aEKVS IMAD a 
TGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGACCACCAAGCGAAACATCGCATCGAGCGAGCACGTACTCGGATGGAA GCCGGTCTTGT 

acgccgccgacgtatgcgaac taggccgatggacgggtaagctggtgg ttcgctttgtagcgtagctcgctcgtgcatgagcctaccttcggccag *a<" '• M&; 
mrrlhtldpatcpfdhqaichr i erartrmeag l "v 

cgatcaggatgat ctggacgaagagcatcaggggctcgcgccagccgaactgttcgccaggctcaaggcgagcatgcccgacggcga ggatctcgtcgtg 

GCTAGTCCTACTA6ACCT6CTTCTCGTAGTCCCCGAGCGCGGTCGGCTTG ACAAGCGGTCCGAGTTCCGCTCGTACGGGCTGCCGCTCCTAGAGCAG'"AC ^ 
° ° ° ° L ° - E E H Q G »■ * * A E > F A R L K A S M p Q G E p L ,. v 

ACCCATGGCGATGCCTGCTTGCCGAATATC ATGGTGGAAAATGGCCGCTTTTCTGGATTCATCGACTGTGGCCGGCTGGGTGT GGCGGACCGCTATCAGG 

TGGGTACCGCTACGGACGAACGGCTTaTiMtTArr irrTTrr.^rrrr. . . .1 i_ _ 1 1 1 ' 1 I OiW 

' 1 •'"■" >Jl - UMM ' ,MO " , - , - , » AUI »»l-lljAtACCGGCCGACCCACACCGCCTGGCGATA3Trr "" 

T H G D A C L P N ! M V E N G R F S G F I D C G R L G V A 0 R y q" 
ACATAGCGTTGGCTACCCGTGATATTGCTG AAGAGCTTGGCGGCGAATGGGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCC GATTCGC AGCG 

tgtatcgcaaccgatgggcactataacgacttctcgaaccgccgcttacccgactggcgaaggagcacgaaatgccatagcggcgagggctaagcg tcg 1 * C,0C 

^ ^ A^-A TR D I AEELGGEW ADR FLVLVG 1 AAPDSQR 
CATCGCCTTCTATCGCCTTCTTGACGAGT TCTTCT6AGCGGGACTCTGGGGTTCGAAATGACCGACCAAGCGACGCCCAACCTG CCATCACGAGATTTCG 

gtagcggaagatagcggaagaactgctcaagaagactcgccctgagaccccaagctttactggctggttcgctgcgggtIggacggtagIgctctaaagc 

' A ' Y " L > ° E F F • AG »• W « S K . P T K » R P T C H H E , S 
ATrCCACCGCCGCCTTCTATGAAAGGTTGGGCT TCGGAATCGTTTTCCGGGACGCCGGCTGGATGATCCTCCAGCGCGGGGAT CTCATGCTGGAGTTrTT 

taaggtggcggcggaagatactttccaacccgaagccttagcaaaaggccctgcggccgacctactaggIggtcgcgcccctagagtacgacct-aagaa 

IPP PPSMKGWASESFSGTPAG .SSSAG ISCWSS 
CGCCCACCCTAGGGGGAGGCTAACTGAAACAC GGAAGGAGACAATACCGGAAGGAACCCGCGCTATGACGGCAATAAAA AGACAGAATAAAACGCA=G GT 

gc3ggtgggatccccctccgattgactttgtgccttcctctgttatggccttccttgggcgcgatactg ccgttatttttctgtcttattttgcgt3'*'"a aa °' : 

' ' 1 G G G • L * H G R R ° V " E P A l ■ R 0 . K 0 „ , K B T -, 

GTTGGGTCGTTTGTTCATAAACGCGGGGTTCGGT CCCAGGGCrGGCACTCrGTCGATACCCCACCGAGACCCCATTGGGGCC AATACGCCCGCGTTrrTT 
CAACCCAGCAAACAAG ^atttgcgcccc AAGCCAGGGTCCCGACCGTGAGACAGCTATGGGGTGGCTCTGGGG TAACCCCGGTTATG CGGGCGC AAAGAA ^ 
■ L . G L F ' " A G F G P R A G T L S I P H n D P IGANTPAFL 



CCTTTTCCCCACCCCACCCCCCAAGTTCGGG TGAAGGCCCAGGGCTCGCAGCCAACGTCGGGGCGGCAGGCCCTGCC ATAGCCTCAGGTTACTCATATAT 

ggaaaaggggtggggtggggggttcaagcccacttccgggtcccgagcgtcggttgcagccccgccgtccgggacggtaIcggagtccaatgagtatat; ^ 

■ P " " T " ° V W V K A Q G S 0 P T S G R Q A L P . P Q y T H I 

ACTTTAGArTGATTTAAAACTTCATTTTTAATTT AAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAA AATCCCTTAACGTGAGTTTTCG 

tgaaatctaactaaattttgaagtaaaaattaaattttcctagatccact tctaggaaaaac i"attagagtactggttttagggaattgcac tc aaaagc " 7i ' : 

" R ' ' • " F 1 ' " L " G 5 " • » S F l . , s ■ P k S L N V S F P 

TTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAA GGATCTTCTTGAGArCCTTTTTTTCTGCGCGTAATCT GCTGCTTGCAAACAAAAAAArrA,-,,- 

aaggtgactcgcagtctggggcatcttttctagtttcctagaagaactctaggaaaaaaagacgcgcattagacgacgaacgtttgtttttttggt 3G'"-' ; 

3 r E R 0 T p • - * » s - * ° l ^ ' lp f c a . s . , c K Q , H H ; 
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T,CCAGC G GT G GTTT G TTT G CCG G ATCAAGA G CTACCAACTCTTTTTCCGAAGG T .ACTG G CTTCAGCAGAG CGCA G ATACC... T .rr.,^ TT . T ..- 

ArGGTCGCCACCAAACAAACGGCCTAGTTCTCGATGGTTGAG AAAAA GG CTTCCAT T G ACCGAAGTCGTCrc G CGTCTATGGrTTAT G ACA G GAAG *TC ' ^ 
' 0 " ' V ■ " '• K - S Y ° L F F " » • «• A 3 A E R R Y Q | l S F " 



G7AGCCGTAGTTAGGCCACCACT TCAAG A A CTCTGTA G CACC G CCTACATACCTCGCTCrGCTAATCCTGTTACCAG TGGCrGCTCrr ai:ti:<;<-/; a -r . 
C,^TC G GCArCAATCCGGT G GTGAAGTTCTTGAGACArCGTGGCGGATG ^TGGAGCGA GAlGAf rAGGACAATGGTCACCGACGACGGTCACCGCTA ^'T ^' 
' S " 5 • ' T - T 5 " T L • H " L " T « L C . S C V Q W U L P y A , ; 
TCGTGTCTTACCGGGTfGG AC ^A^ A ^^^^^^^^^9A^^^^^ A ^^GGTCGGGCTGAACGGGGGGTTCGT G CACACA G CCCARrTTf:KAnr<;*A 
AGCACA G AAT G GCCCAACCrGAGTrCTGCTATCAArG G CCTAr T CCGCG;CGCCA G CCCGACrTGCCCcicAAGCAC G TGTGrCG G GTCGAACCTCGC T ; "« 

■ P G V T ° ° ° 5 Y R ' R R s G » a e R G y R A H s p A w s E 

CGACCTACACC G AACTGA G ATACCTACAGC G T G A G CTATG A GAAAGCGCCACGCTTCCCGAAGGGA G AAA GG C G GACA G G TATCC G GTAAi;r(;i;rAnf:.-^ 
' PTYS VSYEK APRFP Kggppyg IR . A A G 

1^^^^^^^^^^^^^^^^^^^^^^^^^^^^" ^^^^^^ Y ^^ Y ^ ^^^^^*'^^^^ AC( - T( ' T<;Af ' TTr:A ' ;/ *' : ■ 

GCCTTGTCCTC'f CGCGTGCrCCCTCGAAGGTCCCCCTTT G C G GACCATAGAAATATCA GG ACAGCCCA AA G CGGTGGACACTGAACTCGCAGCTAAAAA^ *** 

^ 1FIVLS G FATSDLSV/QFC 

TGATGCTC G TCAGGGGGGCG G AGCCTAT G GAAAAACGCCA G C AAC G C GG CCTTTTTAC G GTTCCTGGCCTTTTGCTGGCC TTTTGCTCACA TftTTr TTTr 

Q arqgggaygktp atrpfygsw p f a g l l L t c s f 



CTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCG CCATGCAT 

gacgcaataggggactaagacacctattggcataatggcggtacgta 5t>lt7 

«- R Y P L I L V I T V L P P C I 
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fig 34 pLM4 (1 > 10070) Site and Sequence 
Enzymes : 100 of 146 enzymes (Filtered) 

Settings: Linear. Certain Sites Onfy. Standard Genetic Code 

TAGTTATTAATAGTAATCAATTACGGGGTCAT^ 

ATCAATAATTATCATTAGTTAATGCCCCAGTAATC AAGT^ TGGC ' ^ 

I r 1 ' "" _ 



LLIVINYGVISS 



PIYGVPRYITYGKW 



P A V L T 



CCCAACGACCCCCGCCCATTGACGTC 

1 i . I i i i i i . 



AATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGrCAATGGG 



TGGAGTATTTACGGT 



GGGTTGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTArCATTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTCAT, 



AAATGCCA 



-dCMV 



A Q R P p p I 0 V N N 0 V C S H S N A N R Q F. P L T S M G G V F T V 

AAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCA 
TTTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGGGGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAA^ 



PCMV 

NCPLGSTSSVSYAKYAP 



R Q 



marlalcpv 



CATGACCTTATGGGACTTTCCTACTTGGCAGTACATC ^ACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGG CGTGGA 

gtactggaataccctgaaaggatgaaccgtcatgtagatgcataatcagtagcgataatggtaccactacgccaaaaccg tcatgtagttacccgcacct aC0 



H ° L . M G 1 S Y L A , V H L R t S H R Y Y H G 0 A V L A V H 0 V A V 
TAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACT 



ITTTCCAAAATGTCGTh 



atcgccaaactgagtgcccctaaaggttcagaggtggggtaactgcagttaccctcaaacaaaaccgtggttttagttgccctgaaaggItttacagcat 500 



-pCMV 



I A V 



LTGISKSPPH 



ROVEFVLAPKSTGLSKMS. 



ACAAC 



TCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTGGTTTAn 



TGAACCGTCAGATCCGCTAGCGCTA 



7GTTGAGGCGGGGTAACTGCGTTTACCCGCCATCC GCACATGCCACCCTCCAGATATATTCGTCTCGACCAAATCACTTGGCAGTCTAGGCGATCGCGA7 6 °° 



-pCMV 



0 L R P IDANGR 



ACTVGGLVKQSWFSEPSD 



P L A L 



CCGGTCGCCACCATGG 



TGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCAC 



AAGTTC AG'- 5 



GGCCAGCGGTGGTACCACTCGTTCCCGCTCCTCGACAAGTGGCCCCACCACGGGTAGGACCAGCTCGACCTGCCGCTGCATTTGCCGGTGTTCAAGrCGC ? * 




VATMVSKGEELFT 



G V V P 



LVELDGDVNGHKFS 



jTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGC 



TGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGAC 



^caggccgctcccgctcccgctacggtggatgccgttcgactgggacttcaagtagacgtggtggccgttcgacgggcacgggaccgggIgggagcactg 



vsgegegoatyg 



K L T LKFICTTGKLPVPWPTLVT 



caccctgacctac ggcgtgcagtgcttcagccgctaccccgaccacatgaagcagcacgacttcttcaagtccgccatg^ 

GTGGGACTGGATGCCGCACGTCACGAAGTCGGCGATGGGGCTGGTGTACTrCGTCGTGCTGAAGAAGTTCAGGCGGTACGGGr 



7 > T . Y G V Q CFSRYPDHHKOHOFFKSAH 



TTCCGATGC AGGTCC 



P E G Y V 0 E 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 66/270 PCTYEP97/06956 



Tuesday. 18 November 1997 1 1 :48 

fig 34 pLIvU (1 > 10070) Site and Sequence 



Paget 



c e c, c c.,cT,c,,c», M «a, c M c.,cr.c., 5< ccc S c em «o, cwr , CCMeK6M MC „ mm ,,, fi , ~ • 



YKTRAEVKFE 



G 0 T L V f j R 



I E L K G 



o....«« M1 , TOl „ 1 . „ 1M iMm tii ^ 



DFKEOGNIL 



^^^LEYNYNSHNvyrM ^J_J 1 _ : nr , 7 

GGTGAACTTCAAGAT CCGCCACAACATCGAGG ACGGCAGCGTGCAGrTrf:rrf:Ar/-*r t ' ' "~ 




TCACfCTCGGCATGGACGAGCTGTACAAGTCCGGACTCAGATCTrfiAfirTr AArf tt#-«-a -ttp 



I TLGMDEl Yk* Qrt rt „ 

' ■ G L " S » A ° A 5 " S A V 0 K L D , F F , a n 

CCTGCTCTTCAGCCAGA TGCTGGACCCAGAGTCCCAGAG AAAr.^^..^^..^^^^^ ~^ " 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 



67/270 



PCT/EP97/06956 



Tuesday. 18 November 1997 1 1 :48 Page % 
jig 34 pLM4 (1 > 10070) Site and Sequence 

G rACATGCACGGCGAACGGGCCCACTACTCCCAC ACCATGCCCATGCGCA3CCCCAGC AAGC TCAGCCATATC TCCCGCC'GGAGCTGGTCGAATCCCK- 
CATGTACGTGCCGCTTGCCCGGGTGATGAGGGTGTGGTACGGGTACGCGTCGGGGTCGTTCGAGTCGGTATAGAGGGCGGACCTCGACCAGCTTAG3GAC 



-insert pLMl 



-ORF pLMl 



YMHGERAHYSHTMPMRSPSKLSH ISRLELVESL 

GACTCGGATGAGGTGGACCTCAAGTCCGGCTACATGAGCGACAGTGACCTCATGGGCAAGACCATGACGGAGGATGATGACATCACTACCGGCTGGGATG 

■ i l ■ i ' 1 ' ' 1 1 1 ' i ' < h I 

CTGAGCC TACTCCACCTGGAGTTCAGGCCGATGTACTCGCTGTCACTGGAGTACCCGTTCTGGTACTGCCTCC TACTACTGTAGTGATGGCCGACCCTAC 



•insert pLM1 



-ORF pLMl 



n SOE V 0 L K -S G V M S n $ D L M G K T M T E D D 0 ! T T G V D 

AAAGC AGCTCCATCAGTAGTGGACTCAGCGATGCCTCAGACAATCTCAGTTCAGAAGAATTCAATGCCAGC TCCTCACTCAAC TCCCTCCCAAGTACTCC 

1 1 1 ' " 1 1 1 1 1 1 1 1 ' 1 ' 1 ' i ' i 20CC 

TTTCGTCGAGGTAGTCATCACCTGAGTCGCTACGGAGTC TGTTAGAGTCAAGTCTTCTTAAGTTACGGTCGAGGAGTGAGTTGAGGGAGGGTTCATGAGG 



■insert pLMl 



-ORF pLMl 



ESSS I SSGLSDASONLSSEEFNASSSLMSLPSTP 
CACTGCTTCTCGCAGGAAC TCAACAATAGTGCTACGCAC AGACTCAGAGAAGCGCTCACTGGCAGAAAG TGGGCTGAGCTGGTTTAGTGAATCAGAGGAG 

— — ^— — i . 1 1 1 ' ... i 1 1 ■ ■ ■ ■ 1 . 1 1 . h 210- ; 

GTGACGAAGAGCGfCCTTGAGTTGTTATCACGATGCGTG TCTGAGTCTCTTCGCGAGTGACCGTCTTTCACCCGACTCGACCAAA TCACTTAGTCTCCTC 



-insert pLMl 



-ORF pLMl 



TASRRNST I VLRTDSEKRSL AESGLSW FSESEE 

AAAGCCCCTAAAAAACTGGAG TACGACAGTGGTAGCCTGAAGATGGAACCTGGGACTTCTAAGTGGCGGAGGGAGCGGCCTGAGAGCTG7GATGATTCAT 
T7TCGGGGATTTTTTGACCTCATGCTGTCACCATCGGAC TTC TACCTTGGACCCTGAAGATTCACCGCCTCCCTCGCCGGAC TCTCGACACTAC TAAGTA 



-insert pLM1 



-ORF pLM1 



KAPKKLEYDSG3LKMEPGTSKWRRERPESC00 3 

CCAAGGGTGGAGAACTGAAAAAGCCC ATCAGCCTGGGCC ACCCTGGTTCCC TGAAGAAG3GCAAGACCCCACCTGTGGCTGT AACTTCCCCCATCACTC A 
GGTTCCCACCTCTTGACTTTTTCGGGTAGTCGGACCCGGTGGGACCAAGGGACTTCTTCCCGTTCTGGGGTGGACACCGACATTGAAGGGGGTAGTGAGT 



-insert pLMl 



-ORF pLMl 



■> >:GGELKKP I SLGHPGSLKKGK TPPVAVTSP I TH 



BNSDOCID: <W0 9824810A2J_> 



WO 98/24810 - ^ 

oa/Z7U PCIYEP97/06956 



Tuesday. 18 November 1997 1 , 4a 
frg 34 P LM4 M > 10070) s „„ „ nM Q ^ 

C.CAGCCCAGAGTGCCCTCAAAGrCGCAGGCAAACCrGAGGGC.AAGCT ACA r. flf -.^ 



Page k 




^^^^ 



5AGCCCGTAACGA6CGGGGA Cfi T fiA ;.r^^; 



'insert pLM1 



S 0 A G 



* D R L S D 



A K K P p 



■ORF pLMl 



s G I A R p 



F G Y ic (c 



P P 



TGAGAGTCGTTCTAGGTCTTC^a^^^ 




^GACTAGCTTAGATGTTTrrA.r^^.- 



TGCCCCGGCCAGCl'AA^ 




ACTCGrCGTCGTAACTGGGGTCA^nr.^T.. 



TGGfTCGTCCC 7C 




^ P . R P V S $ 5 I D P 



5 , L L 3 T K q 



G G L T 



:ACCCGCCTG G TGAGGT CGGGGACAGrTAGTCTGTr^ ^^: 
-insert pLMl 



TTCCGGTfTCGGTrcCG 




): < WO 98248 1 0A2_L> 



# 



PCT/EP97/06956 
WO 98/24810 69/270 



Tuesday. 1 8 November 1 997 1 1 :48 Pa 9 € ^ 
fig 34 pLM4 ( 1 > 10070) Site and Sequence . — 

AGTGGCC TTGGACTCAGACAACATCTCCTTGAAGAGTATTGGCTCCCCAGAGAGTACTCCCAAGAACCAAGCAAGCCACC^ ^ 

7CACCGGAACCTGAGTCTGTTGTAGAGGAACTTCTCATAACCGAGGGGTCTCTCATGAGGGTTCTTGGTTCGTTCGGTGGGGTGTCGGTGGTTCGACCGT 



- insert pLM1 



-ORF pLM1 



VALDSON 1 SLKS IGSPESTPKNQASHP TATKLA 

, . . ■ ■ ' it.- - ■ I ■ ■ I- .— ■ ■ -L ■ ..I,.- -t. ■ ,! „ .., . 

GAGCTGCCACCAACCCCTCTCAGGGCCACAGCGAAGAGCTTTGTCAAACCACCCTCACTAGCCAATC 

CTCGACGGTGGTTGGGGAGAGTCCCGGTGTCGCTTCTCGAAACAGTTTGGTGGGAGTGATCGGTTAGAACTGTTCCAGTTGAGGTTGTCAGACCTAGATG 



-insert pLM1 



-ORF pLM1 



ELPPTPLRATAKSFVKP P S L A N L P K V N S N S L 0 L 
CATCATCCAGTGATACCACCCATGCTTCAAAGGTCCCAGATCTGCATGCTACAAGCTCAGCATCTGGGGGCCCTCTCCCTTCCTGCTTCACCCCCAGTCC 

, I i | r - — j 1 ■ ■ ■ f ' -' i ' i — ■■ I . I . - ■ t... i ■! - — 4 — — ■ - < 32 OC 

GTAGTAGGTCACTATGGTGGGTACGAAGTTTCCAGGGTCTAGACGTACGATGTTCGAGTCGTAGACCCCCGGGAGAGGGAAGGACGAAGTGGGGGTCAGG 



* insert pLM1 



-ORF pLM1 



PSSSOTTHASKVPDLHATSSASGGPLPSCFTPSP 
GGCACCCATCCTCAATATTAACTCAGCCAGCTTCTCCCAGGGCCTGGAGCTAATGAGTGGTTTCAGTGTGCCAAAAGAGACCCGCATGTACCCCAAACTC 

i 1 ■ f . 1 , — i 1 1 i — i 1 1 i « 1 ■ 1 — ■ ' i yjc\ 

CCGTGGGTAGGAGTTATAATTGAGTCGGTCGAAGAGGGTCCCGGACCTCGATTACTCACCAAAGTC ACACGGTTTTCTCTGGGCGTACATGGGGTTTGAG 



•insert pLM1 



-ORF pLM1 



APILNINSASFSGGLE L M S G F S V P K E T R M Y P K L 

TCAGGCCTGCACAGGAGCATGGAGTCCCTCCAGATGCCAATGAGCCTCC CCAGTGCCTTCCCCAGCAGTACTCCCGTCCCCACCCCACCTGCTCCCCCTG 
AGTCCGGACGTGTCCTCGTACCTCAGGGAGGTCTACGGTTACTCGGAGGGGTCACGGAAGGGGTCGTCATGAGGGCAGGGGTGGGGTGGACGAGGGGGAC 



-insert pLMI 



-ORF pLM1 



SGLHRSMESLQMPMSLPSAFPSSTPVPTPPAPP 

- - - i . L. . ■ . . ■ ■ I l 1 ' » . ■ • ■ ■ ■ ■ ' 

CTGCTCCCACAGAAGAAGAGACGGAAGAGCTGACTT6GAGTGGAAGCCCCAGAGCTGGGCAACTGGACAGTAATCAGCGGGATCGGAACACTCTTCCCAA 

1 ■ ■ ■ ( , ■ , , i 1 i i i , , ■ i 1 i i i i i i ■ i i i i ■ i ■ i ' ' : 1 — 1 ' 35vJC 

GACGAGGGTGTCTTCTTCTCTGCCTTCTCGACTGAACCTCACCTTCGGGGTCTCGACCCGTTGACCTGTCATTAGTCGCCCTAGCCTTGTGAGAAGGGTT 



-insert pLM1 



ORFpLMI 

AAPTEEETEELTVSGSPRAGGLDSNQRDRNTLP* 



WO 98/24810 




70/270 PCT/EP97/06956 



Tuesday, 18 November 1997 11 4 Q 
ftg 34 PLM4 f 1 > 100701 ^^and Seguence 



JCCGCT6TAAGGGTATGGTAArr A rrr q ,. 



TGACCAGTCAGALi 





TCGCCGAGG G TTTACCCcT^^7r,^..-. rrrT ;- 
^insert plM1 



TGGGTTCCCTTACTAAGC 



J6CC 




GTCAGGAT 




C*ATC TG*GC*AArcCGCA«rTTrr.T.^ 1 , nr 




: <WO 982481 0A2_L> 



PCT/EP97/06956 
WO 98/24810 71/270 



Tuesday. 18 November 1997 13:56 Page ^ 
tig 53 pLM6 f 1 > 4947) Site and Sequence _ . , 

agaagga ggtatcggagctgcgctctgagctatgggagaaggaaatgaagc ttacagacatccgcttggaggccctcaactctgcccaccaactggatca 

T rTtrCTCCATAGCCTCGACGCGAGACTCGATACCCTCTTCCTTTACTTCGAATGTCTGTAGGCGAACCTCCGGG4GTTGAGACGGGTGGTTGACCTAG7 



-U3 stuk 



-ORF 



KKEVSELRSELWE KEMK LTO IRLEA LN SAH OLOQ 

GCTTCGGGAGACCA TGCACAACATGCAGTTGGAGGTGGACCTGCTGAAAGCAGAGAATGACCGACTGAAGGTAGCCCCAGGCCCCT^ :5C , ; 
rnAAGCCCTCTGGrACGTGTTGTACGTCAACCTCCACCTGGACGACTrTCGTCTCTTACTGGCTGACTTCCATCGGG ~* 



- U3 stuk 



-ORF 



L RE TMHNMQLt v u I l * m ^ « " ^ ^ '\ » " V," , w 7 ~ ~ . 
CCAGGGC AGGT CCCTGGATCATCTGCATTATCTTCCCCACGCCGCTCCCTAGGCCTGGCACTCACCCATTCCTTCGGCCCCA^ 

GGTCCCGTCCAGGGACCTAGTAGACGTAATAGAAGGGGTGCGGCGAGGGATCCGGACCGTGAGTGGGTAAGGAAGCCGGGGTCAGAACGTCTGTGTCTGG 



-U3stuk 



-ORF 



PGQVPGSSAtSSPRR 



SL6LAL THSFGPSLADTO 



TGTCACCCATGGATGGCATCAG TACTTGTGGTCCAAAGGAGGAAGTGACCCTCCGGGTGGTGGTGAGGATGCCCCCGCAGCACATCATCAAAGGGGACTT ^ 
ACAGTGGGTACCTACCGTAGTCATGAACACCAGGTTTCCTCCTTCACTGGGAGGCCCACCACCACTCCTACGGGGGCGTCGTGTAGTAGTTTCCCCTGAA 



-U3 stuk 



-ORF 



L3PMDGISTCG P K E E V T L R V V V R H P P Q H t I K G 0 L 

GAAGCAGCAGGAATTCT TCCTGGGCTGTAGCAAGGTCAGTGGAAAAGTTGACTGGAAGATGCTGGATGAAGCTGTTTTCCAAGTGTTCAA5GACTATATT 
CTTCGKGTCCTTAAGAAGGACCCGACATCGTTCCAGTCACCTTTTCAACrGACCTTCTACGACCTACTTCGACAAAAGGTTCACAAGTTCCTGATArAA 



- U3 stuk 



-ORF 



KQQEFFLGC3 



KVSGKVOVKMLDEAVFQVFKO V I 



rCTAAAArG GACCCAGCCTCTACCCTGGGACTAAGCACTGAGTCCATCCATGGCTACAGCATCAGCCACGTGAAACGAGTGTTGGATGCAGAGCCCCCCG 
AGATTTTACCTGGGTCGGAGATGGGACCCTGATTCGTGACTCAGGTAGGTACCGATGTCGTAGTCGGTGCACTTTGCTCACAACCTACGTCTCGGGGGGC 

— U3stuk " ' 

— ORF — 

SKMOPASTLGLSTESI HGYS 1 SHvKRVLOA E P P 



BNSDOCID: <WO 982481 0A2_L> 



WO 98/24810 72/270 PCT/EP97/06956 



Tuesday, 18 November 1997 13:57 
fig 53 pLM6 ( 1 > 4947) Site and Sequence 



Page <f 




K PM MQHY *^*"*"***^^***-V*-SGPSGTG*Tvi t ». ~ 



™™ c ^^ : 

^^^ACCGGCACTCCAGTGTCTCCC^ 



-U3 stuk 



-ORF 



TmuAAAGGT^TGGATCGGTTGGTCTATCTGGCCCTTTGTCCTTAACCCC TArAC G^G^ATcTcTA^G^TA^ftCCT^Tn ^r t 



-U3 stuk 



-ORF 



' — ' L i T G GOVPL V I LLP D L S E A C S I 3 




r ^^^ ; 

rc&CCTCGG ^^^^ ^^^^ A ^C^CCAAGCAATG GACTCCTCCTTCGACCATCTC ^GTcTGTc<ir 



2cX 




OflF 



BNSDOCID: <WO 98248 10 A2_l_> 



WO 98724810 



73/270 



m 

PCI7EP97/06956 



Tuesday. 18 November 1997 13:57 D C 

fig 53 pLM6 (1 > 4947) Site and Sequence 6 

ACATCAATGCCAACAAGGAAGAGCTGCTTCGGGTGCTCGACTGGGT^ 

TGTAGTT ACGGTTGTTCCTTCTCGACGAAGCCCACGAGC TGACCCATGGCTTCGACACCATAGTAGAGGTG TGGA AGGAAC TC T TCGTGTCGTGG AG TC" ^ 



-U3stuk 



-ORF 



0 1 N A N K £ , E <- L .» V L 0 V V P K L V Y H L H T FLEKHSTSD 

CTTCCTCATCGGCCCTTGC TTCTTTC TGTCGTGTCCCATTGGCATTGAGGACTTCCGGACCTGGTTCATTGACCTGTGGAAC AAC TCTATCATTCCCTAT 
GAAGGAGTAGCCGGGAACGAAGAAAGACAGCACAGGGTAACCGTAACTCCTGAAGGCCTGGACCAAGTAACTGGACACCTTGTTGAGATAGTAAGGGAT 



J7CC 



-U3stuk 



-ORF 



F L G P C F F L S C P I G I E 0 F R T V F I D L y N N S I t P V 

CTACAGGAAGGAGCCAAGGATGGGATAAAGGTCCATGGACAGAAAGCTGCTTGGGAGGACCCAGTGGAATG GG TCCGGG AC AC AC 7TCCCTGGC C A TC AG 

gatgtccttcctcggttcctaccctatttccaggtacctgtctttcgacgaaccctcctgggtcaccttacccaggcccIgtgtga MC 



-U3stuk 



-ORF 



L Q E G AKDG1 K V H G Q K A A V E D P V£VVRDTLP'-P3 

CCCAACAAGACCAATCAAAGC TG ^^CCACCTGCCCCCACCCACCGTGGGCCCTCACAGCATTGCCTCACCfCCCGAGGATAGGACAGTCA AAGACAGCA^ 
GGGTTGTTCTGGTTAGTTTCGACArGGTGGACGGGGGTGGGTGGCACCCGGGAGTGTCGTAACGGAGTGGAGGGCTCCTATCCTG TCACTTTCTGTCGTG ^ 



-U3 stuk 



-ORF 



^Q QDQSK L YH LPPPT VG PHS1 ASPPEDR T V K D S T 

CCCAAGTTCTCTGGACTCAGATCCTC TGATGGCCATGCTGCTGAAACTTCAAGAAGCTGCCAACTACATTG AG TCTCCAGATCG AGAAACC ATCCTGGAC 
GGGTTCA^GAGACCTGAGTCTAGGAGACTACCGGTACGACGACTTTGAAGTTCTTCGACGGTTGATGTAAC TC AGAGGTCTAGCrCTTTGGTAGGACCT^ 



-U3 Stuk 



-ORF 



P S 3 L D S 0 P L M A M L L K L Q E A A N Y | £ S P 0 R E T I L D 
CCCAACC TTC AGGCAACACTTTAAGGGTTCGGCAATCACTGTCACCCCCGGACAGCAGAACGCTGGCATCAGC TATCTTAGC TCCTCCTCTCCCCTCTCC 

gggttggaagtc cgttgtgaaattcccaagccgttagtgacagtgggggcctgtcgtcttgcgaccgtagtcgaIagaaIcgao 

. ll->eh.lf 

— — ► 

ORF 1 

=>NL QAT L.GF GNH CH PRTAERVH0LS...LL3PL 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 



74/270 



PCT/EP97/06956 



Tuesday. 16 November 1997 13:57 p r 

figS3pLM6 (1 >4947) Site and Sequence * 

TCTTTCAGAGCACTGGCTCTCCAGCCCCAGGAGGAGAACAGGAGGGAGGAGGAGATGAAAGAGGAGGGACAGGTT CTTGGTGC TG TACC TTTGAGAACT" 
AGAAAGTCTCGTGACCGAGAGGTCGGGGTC CTCCTCTTG ^CCTCCCTCCTCCTCTACTTTCTCCTCCCTGTCC AAGAACCACGACATGGAAACrCTTGAA 



•U3 stuk 



L F Q S T 6 S P A P G G E Q E G G G D E R G G T G 3 V C C T F E N F 
CCTAGGAAGGAATGGTGGGGTGGCGTTTGGGAACTTGTGCCCCCTAAACACATTTACTGGCCTCCTCTAGAGCGGCCGCCA CCGCGGTGGAGCTCCAATT 

ggatccttccttacca ccccaccgcaaacccttgaac acgggggatttgtgtaaatgaccggaggagatctcgccggcggtggcgccacctcgagg ttaa C: ' iA 



— U3stuk 

lgrnggvafgnlcplntftg 



ll.sgrhrggapi 



cgccctatagtgagtcgtattacgcgcgctcactggccgtcgttttacaacgtcgtgactgggaaaaccctggcgt^ 

gcgggatatcactcagcataatgcgcgcgagtgaccggcagcaaaatgttgcagcactgacccttttgggaccgcaatgggttgaattagcggaacgt^ aiiC ' 

R P I V S R I T R A H V P S F Y N V V T G K T L A L P N L I A L 0 



acatccccctttcgccagctggcgtaatagcgaagaggcccgcaccgatcgcccttcccaacagttgcgcagcctgaatggcgaatggg acgc^ 
tgtagggggaaagcggtcgaccgcattatcgcttctccgggcgtggctagcgggaagggItgtcaacgcgtcgga^ ^ a 

H 1 P L S P A . G V ' , A K » ? A P I A L P N S C A A . M A N G T R P V 

' ' ' I ■ I I I ... I ■ . . , l 

AGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCC TTTCGCTTTCTrcCCTTCC- 
TCGCCGCGTAATTCGCGCCGCCCACACCACCAATGCGCGTCGCACTGGCGATGTGAACGGTCGCGGGATCGCGGGCGAGGAAAGCGAAAGAAGGGAAGGA 
A A H . AR RVVWLRAA . P LH LPAP. RPLLS L S S L P 

TTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGA CCCCAAAAAACT 

aagagcggtgcaagcggccgaaaggggcagttcgagattIagcccccgagggaaatcccaaggctaaatcacgaaatgccgtggagctggggttttttga £?i ' 

F S P R S P A F P V k L . 1 G G S L ■ G S D L V L V G T S T P K N 

tgattagggtgatggttcacgtagtgggccatcgccctgatagacggtttttcgccctttgacgttggagtccacgttctttaatagtggactc ttgttl- 
actaatcccactaccaagtgcatcacccggtagcgggac tatctgc caaaaagcgggaaactgcaacctcaggtgcaagaaattatcacctgasaacaa^ 

I I R V M V H W VG HR P 0 R R F F A L . R V s P R S L I V D S C 3 

CAAACrGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGATTTTGCCGATrTCGGCCTATTGGTTAAAAAATGAGCTGATTTAA: 
GTTTGACCTTGTTGTGAGTTGGGATAGAGCCAGATAAGAAAACTAAATATTCCCTAAAACGGCTAAAGCCGGATAACCAATrTTTTACTCGACTAAATTG ^ 
KLE QHS TLSRSILL IV KGFCRFRP IG . K M S . F M 

AAAAATTTAACGCGAATTTT AACAAAATATTAACGCTTACAATTTAG 

' 1 1 ' ' I i |ii p- UQtiJ 

TTTTTAAATTGCGCrTAAAATTGTTTTATAATTGCGAATGTTAAATC 
« N I T "ILTICY.RL0FR 



BNSDOCID: <WO__982481 0A2_I_> 



WO 98/24810 



75/270 



PCT7EP97/06956 



Tuesday, 1 S November 1 997 1 3 : 57 

fig 54 pLMI (1 > 8265) Site and Sequence Pa 9** 
Enzymes : 72 of 146 enzymes (Filtered) 

. Settings : Circular, Certain Sites Onty. Standard Genetic Code m 

GTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTrTATTTTTCTAAATACArTCAAATATGT ArCCGCTCATGAGACAArAACCCTGATAAAT 

CACCGrGAAAAGCCCCrTTACACGCGCCTTGGGGATAAACAAATAAAAAGATTTATGTAAG7TTATACATAGGCGAGTACTCTGTTArTGGGACTArTTA ^ 
G G T F R G N V ft G T P [ C L F F , 1 H 5 N M Y P L M R Q . p , M 

GCTTCAATAATATrGAAAAAGGAAGAGTATGAGrATTCAACATTTCCGTGrCGCCCTTATTCCCrTT rTTGCGGCATTTrGCCTTCCrGTTrTTGCTCAC 

cgaagttattat aactttttccttctcatactcataagttgtaaaggcacagcgggaataagggaaaaaacgccgtaaaacggaaggacaaaaacgagtg 200 

L Q ■. Y : * ft * S H S 1 Q H F R V A L I P f F A AFCLPVFAM 
CCAGAAACGCTGGrGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGATCTCAACAGC GGTAAGATCCTTGAGAGTT 

ggtctttgcgaccactttcattttctacgacttc™^ 300 

PE T LVKVKQA EDQ LGARU G Y I E L OLNSGK I L E S 

ttcgccccgaagaacgttttccaatgatgagcacttttaaagttctgctatgtggcgcggtattatcccgtattgacgccgggcaagagcaa ctcggtcg 
aagcggggcttcttgcaaaaggttactactcgtgaaaatttcaagacgatacaccgcgccataatagggcataactgcggcccgttctcgttgagc^ q °° 

F ft P E E R F P H H $ T F K V L L CGA VLSRIDAGOEQLGR 

ccgcatacactattctcagaatgacttggttgagtactcaccagtcacagaaaag CATCTTACGGA TGGCATGACAGTAAGAGAATTATGCA6T GCTGCC 
GGCGTATGTGATAAGAGTCTTACTGAACCAACTCATGAGTGGTCAGrGTCTTTTCGTAGAATuCCTACCG TACTGrCATTCTCrTAATACGTCACGACGG *°° 
ft I H V S Q N Q L V E Y S P V T E K H L T Q G M T V R E t C S A A 

AfAACCATGAOTGATAACACTGCGGCCAACTTACTTC TGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAA 
rATTGGTACTCACTATrGTGACGCCGGTTGAATGAAGACTGrTGCTAGCCTCCTGGCTTCCTCGATTGGCGAAAAAACGTGTTGTAC 6 °° 
I T M S D N f A A N L L L T T I G G P K £ L T A F I H N H G 0 H V 

CrCGCCTTGATCGrTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGArGCCTGTAGCAATGGCAACAAC GTTGCGCAAACT 
GAGCGGAACTAGCAACCCTTGGCCTCGACrTACTTCGGTATGGTTrGCTGCTCGCACTG TGGrGCTACGGACATCGTTACCGTTGTTGCAACGCGTTTGA ™ > 
T ft t 0 R V £ P £ L N E A j PNO ERP TT MPVAM AT T L R K L 

AT TAACTGGCGAACTACrTACrCTAGCTTCCCGGCAACAATTAATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCT CGGCCCTTCCG 
TAATTGACCGCTTGATGAATGAGATCGAAGGGCCGTTGrTAATTATCTCACCTACCTCCGCCrAnTCAACGTCCTGGTGAAGACGCGA^ 8 °° 
L T G E L L T I A S R Q Q L 1 D V N £ A 0 K V A G PLLRSALP 

GCTGGCTGGTT TATTGCTGATAAATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCArTGC^CACrGGGGCCAGA 

C-iACCGACCAAATAACGACTATTTA-a^ 900 

AGV F|A0K SG AGE RGSRGII AAL GPOGKPSR IV 
TTmTC TAC ACG ACGGGGAGTCAGGCAAC TATGGATGAACGAAA TAGACAGA TCGCTGAGATAjGTGCCTCAC TGATTAAGCATTGGTAACTGrCA GACCA 
A4TAGATG TGCTGCCCC TCAGTCCGTTGATACC TACTTGCT TTATCTGTC TAGCGAC ^CTAT^CACGGAGTGACTAATTCGTAACCATTGACAGTCrGGT ^ 
V 1 Y T f G S 0 A f * P E R N R Q I A £ | G A S L [ K H V . L S D Q 

AGTTfACTCAT ArArACTTTAGATTGATTTAAAACTTCATTTTTAArTTAAAAGGATCTAGGrSAAGATCCTTTTTGATAATCTCATGACCAA 
rCAAATGAGrATATATGAAATCTAACTAAATTT TGAAGfAAAAAT I'AAATTTTCCTAGATCCACTTCTAGGAAAAACTATTAGAGTACTGGTTTTAGGGA "** 
? ^ . * 'OyKLMF.FK« | . V K I L F D N L H T < t P 

TAAL'GTGAGrTT TCGTTCCACrGAGCGTCAGACCCCGTAGAAAAG ATCAAAGGATCTTCrTGASATCCTTTTTTTCTGCGCGTAATCTGC TCCTTGCAA A 

attgcactcaaaagcaaggtgactcgcagtctggggcatcttttctagtItcctagaagaactctaggaaaaaaagacgcgcattagac^ 1200 

• * E F S F " • A S P P V £ K 1XGSS .0P FF LRV 1 C C L 0 
CAAAAAAACCAC CGCTACCAGCGSTGGTm^ 

^^^^^^^^^^^^^^^ A ^^ ^ A ^^ AA *^ AAA ^ G ^*CCTAGTTC TCGATGGTTGAGAAAAA23CTTCCATTGACCGAAGTCGrCTCGCGTC TATGGTTT ,3 °° 
T y « P P L P A V y C L P D Q E L P T L F P K V T G F S R A Q I P N 

rACTGfCCTTCT A G TG ^^GCCGTAGTTAGGCCACCACrTCAAGAACrCTGrAGCACCGCCTAlATACCTCGCTCTGCTAATCCTGTTACCAGrGG C TGC T 
A TQACAGGAAGA ^ACATCGGCATCAATCCGGTGGTGAAGTTCTTGAGACATCGrGGCGGAT HATGGAGCGAGACGArTAGGACAATGGTCACCGACGA 
r>/LLV , f> LGHHF tCNStfA PP7YLALL1LLPVAA 



BNSDOCID: <WO 9824810A2_I_> 



WO 98/24810 76/270 PCT/EP97/06956 
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llq S4 pLM 1 (1 > 8285) Site and SeoMwwa 



GCCAOrcGCSATAAGTCCTGrCTTACCGGGT TGCACTCAACACGATAGTTACCCGATAAGCCGCAGCGGTCGCGCTOAACGCC GGGTTCGTfiCA^.-ACC 

CGGTCACCGCTATrCAGCACAGAATGGCCCAACCTGAGrTCTGCTATCAATGGCCTATTCCGCGTCGCCAGCCCGACTTGCCCCCCAAGCACGTGrGTCG ,M ° 
ASG OK SCL T G L 0 S fl R . LPOKAORSG . TCGSCTQ 

CCAGCTTGGAGCGAACGACCTACACCGAACTG AGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGG AGAAAGGCGGArA^TAr,, 
GuTCGAACC ICGC rTGC TGGATGTGGCTTGACTCTATGGATGTCGCACTCGATACTCTTTCGCGGTGCGAAGGGCTTCCC TCTriCCGCCTGTCCATAGG 
' 3 U E * f T Y T E L » * ° » ' t- ■ E .« A T t P E G R K A Q , Y p 

GGTAAGCGGCAGGG TCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCrGTCGGGTT TCGCCACCrCTflArTT 

CCATTCGCCGTCCCAGCCTTGTCCTCTCGCGTGCTCCCTCGAAGGTCCcicTTTGCGGACCATAGAAAUTCAGGACAGCCCAAAGCGGlGGAGACrGAi "°° 
^SG RVGTGERTRE LPGGHAVVLVS P V G F R H L t 



1600 



iaoo 



GAGCGrCGATTTTTGTGATGCrCGTCAGGGGG GCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTrACGGrTCCrGGCC rTTTGCTGGCCrrTTG 

CTCGCAOCTAAAAACACTACGAGCAGTCCCCCCGCCTCG 0 ArACCTTTTIGCGGTCGrTGCGCCGGAAA;ArGCCAAGGACCGGAAAACGACCGGAAAAC 
E " » F L ' SG GR SLVK WA S ^ AAF LRFCAf C VPF 

CrCACATGTrCTTTCCr G CGmTCCCCTGATTC T C TGGATAACCGTAT T ACCGCCTTTGAGrGAGCTGATACC6CTCGC CGCAGCCGAAr fi Arrr.,^ 

GAGTGTACAAGAAAGGACGCAATAGGGGACTAAGACACCTATTGGCATAArGGCGGAAACTCACTCGA CTATGGCGAGCGGCGTCGGCTTGCTGGCTCGC 
* H H F F P A L 8 P D S V D W R I T A F £ .A 0 T A R R S R T T E R 

CAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGC CCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCArTAA rGCAGCTGGCArnATAiinTTT 

GrCGCTCAGTCACrCGCTCCTTCGCCTTCTCGCGGGTTATGCGrTTGGCGGAGAGGGGCGCGCAACCGGCTAAGTAATMCGTCGACCGIGCTGTCCAAA 
S E S V S E E a E E R P , R K p p L p A R w p | (| CSVHORF 

CCCGACTGGA 



2000 



lAAGCGCGCAGTGAG CGCAACGCAATTAArGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTTT ATGCT 
TTCGCCCG TCAC TCGCGT TKcr. T T a \J T a r i r t r~ a a t rr *r~T i- * i~ . _i 1 1 1 



■ — — •■ ' 1 — - • i | i | : .^nvvvv^uwi i i /ty^L 1 1 'AfGCf TCCGGCTCGTATGT 

GGGCrGACCrTTCGCCCGTCACrCGCGTTGCGTrAATTACACTCAArCG^IGAGrAATCCGTGGGGrCCGAAATGT GAAATACGAAGGCCGAGCATACA 
" ° V K A G S E R N A I N y S , L T H ■ A P Q A L H F H L P A R H 

TGTGTGGAATTGTGAGCGGATAACAATTTCACACAGG AAACAGCTArGACCATGATTACGCCAAGCGCGCAATTAAC CCTCACTAAAGGGAArAAJafirT 

ACACACCTrAACACTCGCCTATIGrTAAAGTGTGTCCTTTGTCGArACTGGTACTAATGCGGTTCGCGCGTTAATTGGGAGTGATTTCCCTTGTTTTCGA ^ 
LC G I V S G 'QFHTGNS VPHDVAKRA I N P H . R E O * L 



2100 



GGGTACCGGGCCCCCCCrCGAGGTCGACGGlATCGATA AGCrTGATATCGAATTCCTGCAGCCCCTGCTCTTCAGCC AGATGCTGGACCCAGAGtrrrA, 
CLXATGGCCCGGGGGGGAGCTCCAGCTGCCATAGCTArTCGAACTATAGCTTAAGGACGTCGGGGACGAGAAGTCGGTCrACGACCTGGGTCTC^GGGTic 



2300 



- insert pLM) 



gtgpplevqg idklo ieflqpllf s 0 ~Tr: — 




2100 



" " " ' ' ° " * L 0 *■ ° ° .» »■ £ E T " S S L RGSQv FHSSLE 




+ 2500 



» r C Y D S Q Q A N p R s v , , L 3 N R s s p l $ v , r c , s s p R 



Page V 



BNSDOCID: <WO_982481 0A2_I_> 



WO 98/24810 



77/270 



# 



PCT/EP97/06956 



Tuesday. 18 November 1997 13:S7 p aB0 J 
«ioS4pLM1 f1>6265) Site and Sequence _ m __ 

GC TGCAGGCTGG TG ACGCGCCC TCTG TGGG TGGGAGCTGCCGCTCGGAGGGGACGCCCGCCTGGTACATGCACGGCGAACGGGCCCaC TAC FCCCACACC 

' 1 1 ' 1 11 ' 1 ' ' I 2600 

CGACGTCCGACCACTGCGCGGGAGACACCCACCCTCGACGGCGAGCCTCCCCTGCGGGCGGACCATGTACGTGCCGCTTGCCCGGGTGATGAGGGrGTGG 



-insert pLMI 



-ORF pLMI 



l q a g 0 a p s vggscrsegtpavymhgeftahysht 

argcccatgcgcagccccagcaagctcagccatatctcccgcctggagctggtcgaatccctggacrcggat gaggtggacctcaagtccggctacatga 
tacgggtacgcgtcggggtcgttcgagtcggtatagagggcggacctcgaccagc ttagggacctgagcctactccacctggagttcaggccgatgtact 2700 



-insert pLMI 



-OHFpLMI 



H P M R S P S K L S H I S R L E L V E S U P S 0 EVDLKSGYM 

gcgacagtgacctcatgggcaagaccatgacggaggatgatgacatcactaccggc tgggatgaaagcagctccatcagtag tggac tc agc ga tgcctc 
cgctgtcactggagtacccgttctggtactgcctcctactactgtagtgarggccgaccctactttcgtcgaggtagtcatcacctgagtcgctacggag 



2800 



-insert pLM1 



-OftF pLMI 



S P S D L H G K X M T E 0001 TTGVDESSS ISSGLSDAS 

AGACAATCTCAGTTCAGAAGAATT CAATGCCAGCTCCTCACTCAACTCCCTCCCAAGTACTCCCACrGCTTCTCGCAGGAACTCAACAATAGTGCTACGC 

1 '■■■■■« ■ ■ ' ■ ' i i <■! .>.,....,) — , > . . ■ « .... ^. ...>.,., i i i i i i ■ ■ i i 2900 

TCTGTTAGAGTCAAGTCTTCTTAAGTTACGGTCGAGGAG TGAGTTGAGGGAGGGTTCATGAGGGTGACGAAGAGCGTCCTTGACTTGTTATCACGArGCG 



- insert pLM 1 



-ORF PLMI 



O N L S S E E F N ASSSLNSLPS TPTASRRNST IVLR 

ACAGACTCAGAGAAGCGCTCACTGGCAGAAAGTGGGCTGAGCrGG TTTAGTGAATCAGAGGAGAAAGCCCCTAAAAAACTGGAGTACGACAGTGGTAGCC 
TGTC TGAGrCTCTTCGCGAGTGACCGTCTTTCACCCGACTCGACCAAATCACTTAGTCTCCTCTTTCGGGGATTTTTTGACCTCAT6CTGTCACCATCGG 



-insert pLMI 



-ORF pLMI 



T 0 5 £ K R S L A E S G L S V F S £ S E E K A P KKLErOSGS 

rGAAGATGGAACCTGGGACTTCrAAGTGGCGGAGGGAGCGGCCTGAGAGCrGTGATGATTCATCCAAGGGTGGAGA ACTGAAAAAGCCCATCAGCCTGGG 
ACTTCTACCTTGGACCCTGAAGATTCACCGCCTCCCTCGCCGGACTCTCGACACTACTAAGTAGGTTCCCACCTCrTGACTTTTTCGGGTAGTCGGACCC 3, °° 



-insert pLMI 



-OHF pLMI 



L K H E P G T S K V R R £ R PESCDOSSKGGELKKP ISLC 

CCACCCTGGTTCCCTGAAGAAGGGCAAGACCCCACCTGTGGCTGTAACTTCCCCCATCACTCACACAGCCCAGAGrGCCCTCAAA G TC GC AGGC AAACC T 
GGTGGGACCAAGGGACTTCTrCCCGTTCrGGGGTGGACACCGACATTGAAGGGGGTAGTGAGTGTGTCGGGTCTCACGGGAGTTTCAGCGTCCGTrTGGA 



-insert pLM1 



-OnF pLMI 



H P G S L K K C K T P P V A V T S P 1 THTAQSALKVAGKP 

GAGGGCAAAGCTACAGACAAGGGTAAGCTTGCAGTGAAGAATACTGGGCTCCAACGCTCCTCCTCTGATGCTGGTCGGGACCGCC TGAGTGATGCTAAGA 
CrCCCGTTTCGATGTCTGTTCCCATTCGAACGTCACTTCTTATGACCCGAGGTTGCGAGGAGGAGACTACGACCAGCCCrGGCGGACrCACTACGATTCT 33 °° 



- insert pLM 1 



-OHF pLMI 



" G K * t okgklavkntglorsssoagrorlsoak 



BNSDOCID: <WO 982481 0A2_I_> 
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78/270 
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lig 54 pLMI (1 > 82B5) Site and Sequence 



-insert pLMI 



-ORF pLMI 



KPPSG1ARPSTSGSFGYKKPPPATGT 



ATVMQTGG 



rrCAGCCACTCTCAGCAAGATCCAGAAGTCCTCAGGCATCCCTGTCAAGCCAGTAAATGGGCGCAAGACTAG CTTAGATGTTTCCAACAGCGCAGAGCCA 
AAUTCGGTGAGAGTCGTTCTAGGTCTTCAGGAGTCCGTAGGGACAGTTCGGTCA^TACCCGCGTTCTGATCGAArCTACAAAGGTTGTCGCGTCTCGGT 35 °° 



-insert pLMI 



-OflF pLMI 



5 A T , L s K 1 Q K S 5 G I P V K P V N G R K TSLOVSNSAGP 

GGATTCCTCGCTCC yGGAGCCCGTTCTAACATCCAGTACCGCAGCCTGCCCCGGCCAGCCAAGTCAAGTTCrATGAGCGTGACCGGCGG GCGGGGTGGAC 
CCTAAGGACCGAGGACCTCGGGCAAGATTGTAGGTCATGGCGTCGGACGGGGCCGGTCGGTTCAGTTCAAGATACrCGCACTGGCCGCCCGCCCCACCTG 36 °° 



- insert pLMI 



-ORFpLMl 



G F L A P G A R S N I Q Y R S L P R P A K S S $ M S V T G G R G G 
CT-CGCCCTGTGAGCAGCAGCATTGACCCCAGTCTCCTCAGCACCAAGCAGGGAGGCCTTACGCCTTCCAGACTGAAGGAGCCTACCAAGGTAGCCAGTGG 



CAUCGGGACACTCGTCGTCGTAACTGGGG 



rCAGAGCAGTCGTGGTTCGTCCCTCCGGAATGCGGAAGGTCTGAC TTC C TC GGATGG TTCC ATCGGTC ACC 



3700 



-insert pLMI 



-ORF pLMI 



PRPVS 5S1PP SLLSTKQ GGL TP SRLKE PTK V A S G 

GCGGACCACTCCAGCCCCTGTCAATCAGACAGATCGGGAAAAGGAGAAGGCCAAAGCCAAGGCAGTGGCCTTGGACTCAGACAACATCTCCTTGA AGAGT 
CoCC TGGTGACC TCGGGGAC AGTTAGTCTGTCTAGCCCTTTTCCTCTTCCGGTTTCGGTTCCGTCACCGGAACCTGAGTC TGTTGTAGAGGAACTTCTCA 3& °° 



-insert pLMI 



* T T PAPVNQTDRE 



-OftF pLMI 



« E < A KAKAVALOSDN I SLKS 



-"TuGCTCCCC^GAGAGTAC ^CCCAAGAACCAAGCAAGCCACCCCACAGCCACCAAGCTGGCAGAGCTGCCACCAACCCC tctc AGGGCCAC aqcg aaga 
r^A^CGAGGGGTCTCTCArGAGGGTTCTTGGTTCGTTCGGTGGGGTGTCGGTGGrrCGACCGTCTCGACGGTGGTTGGGG AGAGTCCCGGTG TCGCTTCT 39 °° 




-ORF pLMI 



GS P ESTPKN QA SHPTATKLAELPPTPL 



R A T A K 



GC'TTrGTCAfeACCACCCTCACrAGCCAATCTTGACAAGGTC AACTCCAACAGTCTGGArCTACCATCATCCAGTGATACCACCCATGCTTCAAAGGTCCC 

C JAAACAGTTT5GTGGGAGTGATCGGTTAG AAC TGTTCCAGTTGAGGTTGTCAGACC TAGATGGTAGTAGGTCACTATGG TGCGTACGAAGTTTCCAGGG ^ 



AGCCCCCCTCGGGCATTGCTCGCCCCTCCACTTCGGGATCCTTCGGCTACAAGAAGCCrCCrCCTGCCACAGG CACAGCCAC TG TC ATGC AA AC TGG TGG 
TCGGGGGGAGCCCGTAACGAGCGGGGAGGTGAAGCCCTAGGAAGCCGATGTTCTTCGGAGGAGGACGGTGrcCGTG TCGG TG AC AG T ACG T T TG AC C AC C 



FVKPP SLAN 



-ORF pLM1 



tOK V N5NSLDLPSSS0TTHAS 



&'jATCTCCATCC ^A^AAGCTCAGCATCTGGGGGCCCTCTCCCTTCCTGCTTCACCCCCAGTCCGGCACCCATCCTCAATATTAACTCAGCCAGCTTCTCC 
^TAGACGTACGATGTTCGAGTCGTAGACCCCCGGGAGAGGGAAGGACGAAGTGGGGGTCAGGCCGTGGGTAGGAGTTATAATrGAGTCG^ 1,1 °° 




V L H 



-ORF pLMI 

TSSASGGPLPSCF TP 



S P A P 



LNINSASFS 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 



79/270 



• 

PCT7EP97/06956 



Tuesday. !8 November 1997 13:57 

fig S4 pLM 1 n > 6285) Site and Swumai 



GrCCCGGACCTCGATTACTCACCAAAGTCACACGGTTTTCTCTGGGCGTACATGGGGTTTGAGAGTCCGGACGTGTCCT^TArrTr.^ 4,200 



TCCC TCCAGA TGC 
AGGGAGGTCTACG 




OGlELNSGFSvpKETftMvPifi * ' • 
■ ' . i "ri ET RriVPK LSG LHRSH E S L 0 (1 

CAATGAGCCTCCCCAGrGCCTrcCCCAGCAGTACTCCCGTCCCCACCCCACCTGCrCCCCCTGCTGCT C C C Ar A r.AAr. fl . 



lAGACACGGAAGAGCTGACTTG 



GTTACTCGGAGGGGTCACGGAAGGGGTCGTCATGAGGGCAGGGGTGGGGTGGACGAGGGGGACGACGAGGGTGTCTTCTTCTCTGCCTrCTCnArr^ M30 ° 




-OflF pLMl 



PMSLP SA FPSST PVPTPPAPPAA 



P T £ E E T £ £ L T W 

GAGTGGAAGCCCCAGAGCTGGGCAAC TGGACAG TAATCAGCGGGArCGGAACACTCTTCCCAAGAAAGGCCTCAGGTACCAGCrTCAKT 



, U uv U .w^uu HiAtcrcTCATTAGTCGCCCTAGCCTTGTCAGAAGGGTTCTT 



GTCCCAGGACGAG 




SGSPffA GO L D 5 U 
ACCAAGCUGAGGCG ACATTCCCATACCATTGGTGGCCrGCCTGAATCCGArGACCAGTCAGAGCTGCCTTCrc CCCCTGCACT 

tggttcctctc^g^taagggtatggta^ccacccgacggacttaggctactggtcagIctcgacgga^aggggga^ 



TCCCATGTCTCT6AGTG 



TACAGAGACTCAC 



4500 




T * E R R H S H T I 



G G L p ESOOQSELPSP 



palpmsls 



CAAAGGGCCAACTTACCAACATAGTGAGTCCCACTGCGGCCACCACGCCAiiRA. 



GTTTCCCGCrTCAATCGTTCTArCACTCAGGGTCACCCCCG TGGTGCGCTTCTTAGTGGGCCAGGTTCTCGTAGGGGTCGGTfir 



iATCACCCGCTCCAACAGCATCCCCA CCCACGAGGCGGCC TTCGAGCT 

TCCGCCGGAACCTCCA 



t»600 




A * C Q L T N I 



VSPTAATTPRIT 



RSNSIPTMEA 



A F E L 



GT"ACAGCGuCTCCCAAATGGGGAGCACCC 



TGTCCCTGGCCGAGAGACCCAAGCGAATGATTCGGTCAGGATCCTrcCGAGACCC 



C'ATG rCGCCSAGGGTTTACCCCTCGTGGCACAGGGACCGGC TC yCTGGGTrcCCTTACTAAGCCAGTCCTA^S^A^GGCTC 



CACGGACGATGTTCAC 



rGGGGTGCC TGC TACAAGTG 



a 700 




4800 



«S V L S L A S 



sasstyssae e r m o s e 0 , R K L 



« R i L 



AATCATCCCAGGAAAAAGTCGCCACCTTCArnTrrr^rTT 



rCTGCCAATGCTAATCTCGTGGCTGC rTTTGAGCAGAGCCTGGTGAATATGACATrrfr. 



rrA G rAGGGTCCrrTTTCACCGGTGGAACrGCAGAGTCGAAAG_ACGGn^ 




4900 



SSQ E K V A T 



-OOF pLMl 
L TSOLSANAN 



>■ V A A F I Q S L V N M J 



S R 



Pag*- 5 



CAGGGCCTGGAGCrAArGAGrGGTTTCAGrGTGCCAAAAGAGACCCGCATGTACCCCAAACTCTCAGGCCTGCACAGGAGCATC^n 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 80/270 PCT7EP97/06956 



Tuesday, 10 November 1997 13:57 
llg54pLMl (1>82BS) Site and Smu^a 



rCTGAGGCCCAG 



GGACCCTGTGCACCGTCTCTGCCGGCrCCTCTT CCTGTGACrCGACGACCrAAACGCTCTTTGGTArCTGAAAGACrTCTTTTTcTTGAGAC TCCGGGTC 

•insert pLMi 



5000 




lrhlaetaeekote 



llqlre r to flkkknseao 



CCAGT 



"CATTCafiWGCCCTTAAT^ 

CGTCAGTAAGTCCCTCGGGAATTACGGAGTCTTT GGJGJG^TCTTGAAGCCTAGrTCTCTGTTT TG^ 5100 




A V ' ° G A 1 N A 5 E T T » K e L R I k R 0 N S S D S I S S 
rCACrAGCCATTCCAGCAKGGCAGCAGCAAGGATGCTGATGCGAAAAAGAAGAAAA/i 



L N S 



AAAAAGAGTTGGGTCTATCAGCTrCGAAGTTCCTTCAACAAAGC 



AGTGATCGGTAAGGTCGTAGCCGTCGTCGTTCCTACGACTACGCTrTTTCrTCTTTTTTTTCTCAACCCAGAT^^ 



5200 




' T s HSSIGSSKO 



-ORFpLMt 

ADAKK KKK K 5 V VVELR5SFNKA 



GTTCAGTA TAAA AAAGGGGCCCAAGTCAGCTTCCTCA TACTCGGA 



CAAGrCATATTTrTTCCCCGGGTTCAGTCGAAGGAGTATGAGCCTArATCTCCTCTAACGATGTGGGCTGAG: 



TAIAGA6GAGATTGCTACACCCGACTCTTCAGCCCCCTCATCCCCCAAACTACAG 



AAGTCGG G GGAG TA GG GGG T TTG A TGT C 



5300 




' 5 1 K K G P K S A 5 ^ S 0 1 £ E , A T p Q s s A p s s p ^ t Q 

CATGG 



TTCCACAGAGACTGCTTCACCCTCCATCAAGTCCTCCA 



GTACCAAGGTGTCrCTGACGAAGTGGGAGGTAGrrCAGGAGGTGi 



TTGTCCTCCGTGGGCACTGATGrCACCGAGGGCCCTGCTCACCCAGCCCCCCACA 



iAACAGGAGGCACCCGTGACTACAGTGGCTCCCGGGACGAGTGGGTCGGGGGGTl 



GT 



5400 




"GSTETASPS 



K S S T L S SVGTOVTEG 



P A H P A P H 



CTAGCCTG 



TTCCATGCAAATGAGGAGGAGGAGCCAGAGAAGAAGGAGGTArCGGAGCTGCGCTCTGAGCTATGGGAGAACn. 



GATCCCAC AAGGTACGTTTACrcCTCCTCC TCGCTCTCTTCTTCC TCCATAGCCTCCAC GCGAGACTCGArACCCTCTTCCTTTAC 



AAATGAAGC TTACAGACAT 



TTCGAA FGTCTGTA 



5500 




^ ft L F H A N E £ £ £ P E K K £ V S E L R 



StLVEKEMKiroi 



CCGC y^^^9^^^^^ A ^T^^^^^^^ A ^CTGGATCAGCTTCGGGAGACCATGCACAACATGCAGTTGCAGGrGGACC TGCTGAAAGCAGAGAATGAC 



GGCGAACC ^^^^^Q^Q^^GAGACGGGTGGTTGACCT AGTCGAAGCCCTCTGGTACGTGTTGTACGTCA ACC r cf jrrTnr.Arf.AfTTTrr 

-insert pLMi 



TC TC1TACTG 




5600 



fi L E A L N S 



AHQLOOLftETMMNM 



OLEVOLLKAENO 



CGACfGAAGGTAGCCCCAGGCC^ 



TCCC TAGGCCTGGCAC 




5700 



* I K V a p G 



f* S S G S T P 



-OPF pLMi 

0v pgssalssprr 



S L G L A 



Pagr 



CCTGCGACACCTGGCAGAGACGGCCGAGGAGAAGGACACTGAGCTGCTGGATTTGCGAGAAACCATAGACrrTCTGAAGAAAAAGAAC 



BNSDOCID: <WO 982481 0A2 I > 



WO 98/24810 81/270 PCT/EP97/06956 
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tiQ54pLM1 ft > 6285) Site and S^m^- 



rC,CCC,TTCCTTC» W CCCCAGT C TT 6 C, ?A c A CAO A CC T G rCACCCATC 6 M C6 C,TCA CT ACTT G TC aT C C J ^ r ..^^., 
AGTGGGTAAGGAACCCGGGGTCACAACGTCTGTGrCTGCACAGTGGGTACCTACCGTAGTCArGAAf ArrA; 



TGACCCTCCGGGTGGT 



mGTTTCCTCCTTCACTGGGAJGCCCACCA 



5600 




L r HSFGPSLAO 



T ° L S " " 0 C 1 S T C G P K E E V T U R V y 

GGTGAGGArGCCCCCGCAGCACATCATCAAAGGGGACTTGAA GCAGCAGGAATTCTTCCTGGGCTGTAG CAAGGTCAKTnrtaA^ a/: 




CrGGATGAAGCTGrTTrC C A AGTGTTCAAGGACTATArTTCTAAAATGGArrrAr.rf Tr TA/ ' rr T ^ 'iGACTAAGCACTGAGTCCATCCATGGCTACAfif A 

" r~-~ ■ lkM ^AAGTTCCTGATATAAAGArTTTACCT G GGTCGGAGATGGG ACCCrGATTCGrGACTCAGGTAGGTA CCGATfrrrfrT 6000 

-insert pLMl 




6100 



R GVNNI SV S L K 6 L K t 




GuCCCCAGCGGCACGGGCAAGACCTACCTGACCAATCGCTTCfirrnAfiTArrTge T „ rr . r 

C-.tiGGGTCGCCG IGCCCGfTCrGGArGGAC ^CGTTAGf fiAACCGGCTCATGGACCACCTCGCGAGACCnnr±T 



iACGTCACAGAGGGCATCGT CAGCACC T 
TCCAGTG TCTCCCGTAGCAGTCGTGGA 



6300 




F N H u n ORF pLMl . 

'-^JLL±^ ^QLYLSNLANQ 1 0 R £ TG I G D V P t V 

C -^^^^ „ r rr ; asoo 




1 L L D t> t S E A G 



-OflF qLM1 



G S I S E L 



N G A L F c K 



Page? 



* h k c y r i 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 82/270 



PCT/EP97/06956 



Tuesday. 16 November 199713:57 

lig 54 pLMl f 1 > 6285) Site and Sequence 



ACCAATCJGCCTGTAAAAATGACACCCAACCATGGCTT& 



ICAC 



TTGAGCTTCAGGATGTrGACCTTCrCCAACAACGTCr,A: 



rGUTrAGTCGGACATTTTTACTGTGGGTTGGTACCGAACGTGAACrCGAAGTCC^CAACTGGAAn^;^ 



CCAGCCAATGGCTT CCTGG 
TTGCACCTCGGTCGGTTACCGAAGGACC 



6600 




T * Q P V K M T P N H 



G l HLSFRHLTF 



S N N V E P 



A N G F L 



rrCGTTACCTCAGG AGGAAGCTGGTAGAG 

AAGCAATGGACTCCTCCrTCGACCArCTCAGTCrGTCGCrGTAGT 



TCAGACAGCGACArCAATGCCAACAAGGAAGAGC TGC TTCGGGTGC TCGAC 



TGGGTACCCAAGCTGTGGTA 




GPCFF LSCPIG 1 C 0 F R T 



TGGTTCATTGACCTGTGGAACAACTCTATCATTCCPT, 




6900 



^^^^ r 




— ■ V ^ W VRD TLPWP SAO Q OQ 5 KL 

^ WWW ^T. ^,,KMC^ «« re |^ crMm Ka ^ fH ,, , |i ^ 7.00 




* S ? P E 0 « 



T V " 0 S T P S S I D S 0 P L H 



ANLLKLOE 



*^gl£^^ 



7200 




Pagp V 
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riqS4pLM1 (1>egS5) Site and Sequence Pa S e " 

AuoAGGGACAGGTTCTTGGTGCTGTACCTTTGAGAACTTCC TAGGAAGGAATGGTGGGG TGGCGTT TGGGAACTTGTGC CCCCTAAACACATTTACTGGC 

tcctccctgtccaagaaccacgacatggaaactcttgaaggatccttccttaccaccccaccgcaaacccttgaacacgggggatttgtgtaaatgaccg 7a00 



-insert pLMt 



G G T GSV CC TF E NF L GR NGGVAF GNLCPLNTF r G 

CrcCTCTAATGACTTTgGGGAAAAGATGATTCTGGGTCrTTCCCTTGACTTCTTGTTTCAATTACAAACTCCTGGGCTTTCTGGGGAGGGGTTC AGAAAA 
GAGGAGATTACTGAAACCCCTTTTCTACTAAGACCCAGAAAGGGAACTGAAGAACAAAGTTAATGTTTGAGGACCCGAAAGACCCCTCCCCAAGTCrTTT 



7500 



in««»i pi m — 

. L V G K 0 Q S G S F P . L LVSITNSVAFVGGVO 



CATCAAAACACTGCAGCAGTTCCCCGGAATTCAGCTTGGACTTAACCAGGCTGAACTTGCTCAAAAGAAGCCGAArTCCAGCACACTGGCGGCC GTTACT 
GrAGTTTTGrGACGTCGTCAAGGGGCCTTAAGTCGAACCTGAATTGGTCCGACTTGAACGAGrTTTCTTCGGCTTAAGGTCGTGTGACCGCCGGCAATGA 7600 



-insert pLMI 



TSKMC SS SPE fS LDLTftLNL tKRSRIPAHW RP L L 
AGTTCTAGAGCG GCCGCCACCGCGGTCGAGCTCCAATTCGCC^ 

^ AGATC ™*^^ 7700 

V L E P P P p R w S S N S P Y S E S Y Y A W S L A V V L 0 R R D y 

GAAAACCCTGGC GTTACCCAACTTAATCGCCTTGCAGCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCCCTTCCCAAC 
CrTTTGGGACCGCAATGGGTTGAATTAGCGGAACGTCGTGTAGGGGGAAAGCGGTCGACCGCATTATCGCTTCTCCGGGCGTGGCTAGCGGGAAGGGTTG 7M ° 
CNPGV TQtNR LAAHP PFASW RNSEEARTORPSQ 



AGTTGCGCAGCC TGAATGGCGAATGGGACGCGCCCTGTAGCGGCGCATTAAGCCCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAG 
TCAACGCGTCGGACTTACCGCriACCCTGCGCGGGACArCGCCGCGTAAiTCGCGCCGCCCACACCACCAATGCGCGTCGCACTGGCGATGTGAACGGTi 7900 
QLRSL WGEWQAPCSGALSAAGV VVTR S V T A T L A S 

CGCCCTAGCGCCC GCTCCTTTCGCTTTCTTCCCTTCCTTTCrCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGGC TCCCTTTAGGGTTC 
GCGGGATCGCGGGC6AGGAAAGCGAAAGAAGGGAAGGAAAGAGCGGTGCAA6CG6CCGAAAGGGGCAGTKGAGATTTAGCCCCCCAGGGAAATCCCAAG ^ 
ALAPAPFAFFPSF LATFA GFPRQ ALN RGL P L G F 

CGATTTAGTGCT rTACGGCACCTCCACCCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGA 
GCTAAATCACGAAArGCCGTGGAGCTGGGGTTTTTTGAACTAATCCCAC TACCAAGTGCATCACCCGGTAGCGGGACTArCTGCCAAAAAGCGGGAAACT 0 '°° 
RF S A *-^HLOP KKL O . GDGSR SGPS P . . T V F R P L 

COT TGGAG rCCACGTTCTTTAATAGTGGAC ^CTTGTTCCAAACTGGAACAACACrCAACCCTATCTCGGTCrATTCTTTTGATTTATAAGGGATTTTGCC 

gcaacctcaggtgcaagaaattatcacctgagaacaaggIttgaccttgttgtgagttgggatagagccagataagaaaactaaatatt^ 8200 

fLEST FFN SC>L FOTGTTLNP 1SVV5F0 



GATTrCGGCCTATTGGTrAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAATTTTAACAAAArATTAACGC 



C 



L . G 1 L P 
fTACAATTTAG 



rAAAGCCGGATAACCAATTrTTTACTCGACTAAATTGTTTTTAAATTGCGCTTAAAATTGTTTTATAATTGCGAATGTTAAATC **** 
1SAYVLKWEL [ 0 < F N A N F N K I t T L T t . 



<WO 982481 0A2 I > 



WO 98/24810 84/270 PCTYEP97/06956 



Tuesday, 18 November 1997 13:57 . 
fig55pCB251 (1 >8197) Site and Sequence Page 
Enzymes : All 146 enzymes (No Filter) 

Settings: Linear. Certain Sites Only, Standard Genetic Code 

GACGGATCGGGAGATCTCCCGATCCCCTATGGTCGACTCTCAGTACAATC TGC TCTGATGCCGCATAGTTAAGCCAG TATCTGC TCCCTGC TTGTG TGTT 

CTGCCTAGCCCTCTAGAGGGCTAGGGGATACCAGCTGAGAGTCATGTTAGACGAGACTACGGCGTATCAATTCGGTCATAGACGAGGGACGAACACACAA 
. T 0 R , E 1 S . R S P M V 0 S Q Y N L L . C R I V K P V S A P C L C V 

GGAGG TCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTAC AACAAGGCAAGGCTTGACCGACAATTGCATGAAG AATCTGCTTA GGGTTAGGCGTTTTGCG 
CCTCCAGCGACTCATCACGCGCTCGTTTTAAATTCGATGTTGTTCCGTTCCGAAC TGGCTGT TAACGTACTTC TTAGACGAATCCCAATCCGCAAAACGC ^ 
G G R .- V V ^ E Q N L 5 Y N K A R L p R Q L H E E S A . G . A F C 

CTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTATTGAC TAGTTATTAATAGTAATCAATTACGGGG TCATTAGTTCATAGCCCATATA 
GACGAAGCGCTACATGCCCGGTCTATATGCGCAACTGTAACTAATAACTGATCAATAATTATCATTAGTTAATGCCCCAGTAATCAAGTATCGGGTATAT 
AA SRCTG Q IY ALTLflD . LL i y INYGVISS.PIY 

TGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGA CGTATGTTCCCATAG7 
ACCTCAAGGCCCAATGTATTGAATGCCATTTACCGGGCGGACCGACTGGCGGGTTGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCA 
G VP ftY| TY6ICVPAVLT AQ RPPP| D VNNDv C S H S 

AACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGACTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTAC GCCC 
TTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTGATAAATGCCATTTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGG 
NAM RDFPL TS MGG LFTVNCP LGS T SSVSYAKYA 

CCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATT^^ 
GGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCATGTACTGGAATACCCTGAAAGGATGAACCGTCATGTAGATGCATAATCAGT 

P Y • p Q • R . ■ marlalcpvhdlmglsyla vhlr I S H 

tcgctattaccatggtgatgcggttttggcagtacatcaatgggcgtggatagcggtttgac tcacggggatttccaagtctccaccccattgacg tcaa 
agcgataatggtaccactacgccaaaaccgtcatgtagttacccgcacctatcgccaaac tgagtgcccctaaaggttcagaggtggggtaactgcagtt 7:0 

Y H G D A V L A V H Q V A V I A V , L T G I S K S P P H , R Q 

TGGGAGTTTGTTT TGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACGCAAATGGGCGG^^ 

ACCCTCAAACAAAACCGiGGTTTTAGTTGCCCTGAAACGTTTTACAGCATTGTTGAGGCGGGGTAACTGCGTTTACCCGCCArCCGCACATGCCACCCTC 
V E F V L * ? * 5 T C L s K M s Q L R P I D A W G R . AC T V G 

G ^"^^^**^C AGAGCTC |^TGGCTAACTAGAGAACCCACTGCTTAC j^g^y j^^q^^j ^^^.^^^.^ ^ ACTATAGGGAGACCCAAGC TGGC TAGC 

cagatatattcgtctcgagagaccgattgatctcttgggIgacgaatgaccgaatagctItaattatgctgagtgatatccctctgggtIc^ 

L > 

T7 promoter pruninq site — » 

CLYKQ SS LAN , RTHCLLArW W . r DS L. GDPS V L A 

GTTTAAACTTAAG CTTACCATGGGGGGTTCTCATCATCATCATCATCATGGTATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGGGATCTGTACGAC 
CAAArTTGAATTCGAArCGTACCCCCCAAG AGTAGTAGTAGTAGTAGTA CCATACCGATCGTACTGACCACCTGTCGTTTACCCAGCCCTAGACATGCTG 

I— Probond binding domain | 
^KLK LTMGGSHHH HHHGMASMTGGQQMG R 0 L V 0 

GATGACGATAAGGTAC CCGGATCCTTCCGAGACCCCACGGACGATGTTCACGGCTCAGTGCTGTCCCT3GCCTCC AGTGCCfCCTCCACCr ACTCCTCA3 

ctactgctattc catgggc ctaggaaggctctggggtgcctgctacaagtgccgagtcacgacaggga;cggaggtcacggagga3gtgga^aggagtc ilK 
— '< < 

- 0 0 « V P G S F R 0 P T D p y h G S V L S L A S s A S S 7 * 3 3 
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fig 55 pCB2S1 (1 >8197) Site and Sequence 

C TGAGGAGAGGATGCAATC TGAGCAAATCCGGAAGCTTCGTAGGGAAC TGGAA TC ATCCCAGGAAAAAG TG GCCACCTTGACGTCTC AGCTTTC TGCCAA 
GACTCCTCTCCTACGTTAGACTCGTTTAGGCCTTCGAAGCATCCCTTGACCTTAGTAGGGTCCTTTTTCACCGGTGGAACTGCAGAGTCGAAAGACGGTT 



-pCB251 inserts U2 



-U2 0RF 



A E £ R M Q S E Q I R K L R R E L £ S S Q E K V A T L T 3 0 L S A M 

TGCTAATCTGGTGGCTGCTTTTGAGCAGAGCCTGGTGAATATGACATCCCGCCTGCGACACCTGGCAGAGACGG CCGAGGAGAAGGACACTGAGCTGCTG 
ACGATTAGACCACCGACGAAAAC TCGTCTCGGACCACTTATACTGTAGGGCGGACGC TGTGGACCGTCTCTGCCGGCTCCTCTTCCTGTGACTCGACGAC 



i; 



- pCB251 insert = U2 



-U2 0RF 



A N L y A A F E Q S L V N M T 5 RLRHLAETAEEKOTELL 

GATTTGCGAGAAACCATAGACTTTCTGAAGAAAAAGAACTCTGAGGCCCAGGCAGTCATrC AGGGAGCCCrTAATGCCTCAGAAACCACACCCAAAGAAC 
CTAAACGCTCTTTGGTATCTGAAAGACTTCTTTTTCTTGAGACTCCGGGTCCGTCAGTAAGTCCCTCGGGAATTACGGAGTCTTTGGTGTGGGTTTCTTo 



I HOC 



-pC8251 inserts U2 



-U20RF 



0 L R E 1 IDFLKKKNSEAQAVIQGALNASETTPKE 



TTCGGATCAAGAGACAAAACTCCTCAGATAGCATCTCAAGCCTCAACAGCATCACTAGCCATTCCAGCATCGGCAGCA GCAAGGATGCTGATGCGAAAAA 
AAGCCTAGTTCTCTGTTTTGAGGAGTCTATCGTAGAGTTCGGAGTTGTCGTAGTGATCGGTAAGGTCGTAGCCGTCGTCGTTCCTACGACTACGCTTTTT 



-U2 0RF 



L B . 1 K R Q N S S 0 S I S S L N S t T S H 5 S I G SSKDADAKt 

GAAGAAAAAAAAGAGT TGGGTCTATGAGCTTCGAAGTTCCTTCAAC AAAGCGTTCAG TATAAAAAAGGG GC CC AAGTC AGC T TCC TC AT AC TCGGA TATA 
CTTCTTTTTTTTCTCAACCCAGATACTCGAAGCTTCAAGGAAGTTGTTTCGCAAGTCATATTTTTTCCCCGGGTTCAGTCGAAGGAGTATr t AGrrTATA7 



-U2 0RF 



K K K , K 3 V v Y E L R 5 S FNKAFSIKKGPKSASSYSO I 

GAGGAGATTGCTACACCCGACTCTTCAGCCCCCTCATCCCCCAAACTACAGCATGGTTCTACAGAGACTGCTTCACCCTCCATCAAGTCCTC CACC TTGT 
CTCCTCT AACGATGTGGGC TGAGAAGTCGGGGGAGTAGGGGGTTTGATGTCGTACCAAGATGTC TCTGACGAAGTGGGAGGTAGTTCAGGASGTGGAAlA 



— — U2CRF 

£ 1 A T p DSSAPSSPKLQHGSTETASPS IKS STL 
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CCTCCGTGGGCACTGATGTCACCGAGGGCCCTGCTCACCCAGCCCCCC ACACTAGGCTGrTCCATGCAAATGAGG AGGAGGAGCCAGAGAAGAASG AGGT 
GGAGGCACCCGTGACTACAGTGGCTCCCGGGACGAGTGGGTCGGGGGGTGTGATCCGACAAGG TACGTrTACTCCTCCTCCTCGGTCTCTTCTTCCTrrl 

-pCB251 inserts U2" 




S 3 V G T 0 V , T £ G ,P A H P A P H T R L FHANEEEcPEKKE 



ATCGGAGCTGCGCTC 



TGAGCTATGGGAGAAGGAAATGAAGCTTACAGACATCCGCTTGGAGGCCCTCAACTCTGCCCACCAACTGG 



iATC AGC TTCGGGAG 




S E L . R S E > V E < E M K L T D 1 R L E A L N S A H Q L D 0 L R E 

ACCATGCACAACATGCAGTTGGAGGTGGACCTGCTGAAAGCAGAGAATGACCGACTGAAGGTAGCCCCAGGCCCCTCATCAGGCTCC ACTCCAGGGCAGG 
TGGTACGTGTTGTACGTCAACCTCCACCTGGACGACTTTCGTCTCTTACTGGCTGACTTCCATCGGGGTCCGGGGAGTAG TCCGAGGTGAGGTCCCGTCC 



20CC 



— pCB251 insert = U2 



-U2QHF 



THH NM QLEVD LLKA ENDRLKVAPGPSSGSTPGQ 



TCCCTGGATCATCTGCATTATCTTCCCCACGCCGCTGCCTAGGCCTGGCACTCACCCATTCCTTCGGCCCCAG TCTTGCAGACACAGACCTGT CACCCAT 
AGGGACCTAGTA GACGTAATAGAAGGGGTGCGGCGAGGGATCCGGACCGTGAGTGGGTAAGGAAGCCGGGGTCAGAACGT^ 

^pCB251 insert = U2 



I I CC 



-U2CRF 



V ^ G S S A L S S P R R 5 L G L A L T H S F G P S L A D T D L S P M 

GGATGGCATCAGT ACrTGTGGTCCAAAGGAGGAAGTGACCCTCCGGGTGGTGGTGAGGATGCCCCCGCAGCAC ATCATCAAAGGGGAC^ 

CCTACCG rAGTC ATGAACACCAGGrTTCCTCCTTC AC TGGGAGGCCCACCACCAC TCCTACGGGGGCGTCGTG TA GTAGTTTCCCCTGAAC TTCGTCGTC 

^08251 inserts U2 



-U20RF 



D G [ - S T C G P K E EVTLRVVVRMPPQHIIKGDLKQO 



GAATTCTTCCT GGGCTGTAGCAAGGTCAGTGGAAAAG TTGACTGGAAGATGCTGGATGAAGC TGTTTTCCAAGTGTTCAAGGACTATA TTTCTAAAATGG 
CTrAAGAAGGACCCGACATCGTTCCAGTCACCTTTTCAACTGACCTTCTACGACCTACTKGACAAAAGGTTCACAAGTlcCTGATATAAAG ^ 



-pCB251 insert = U2 



-U20RF 



F . LGC5KV SG KVOVKMLOEAVFQVFKOYI 
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tig 55 pCB251 (1 >B197) Site and Sequence 



Page 



ACCCAGCCTCTACCCTGGGACTAAGCAC TGAGTCCATCCATGGCTACAGCATCAGCCACGTGAAACGAGTGTTGGATGCAGAGCCCCCCGA GArGCCTCC 
TGGGTCGGAGATGGGACCCfGATTCGTGACTCAGGTAGGTACCGATGTCGTAGTCGGTGCAC TTTGCTCAC AACCTACGTCTCGGGGGGCTC TACGGAGG 



■pCB25l insert = U2 



-U2QRF 



D P A S T L G L S T E S I H G Y S I SHVKRVLDAEPPEMPP 

TTGCCGTCGAGGTGTCAATAACATATCAGTCTCCCTCAAAGGTCTGAAGGAGAAATGCGTCGA CAGCCTGGTGTTCGAGACGCTGATCCCCAAGCCGATG 
AACGGCAGCTCCACAGTTATTGTATAGrCAGAGGGAGTTTCCAGACTTCCTCTTTACGCAGCTGTCGGACCACAAGCTCTGCGACTAGGGGTTCGGCTA^ ^ 



-pCB251 inserts U2 



C « R G V H N I S V S L K G L KEKCVDSLVFETL I P K P M 

ATGCAGCACTACATAAGCCrCCTGCTGAAGCACCGGCGCCTCGTCCTC TCGGGCCCCAGCGGCACGGGC AAGACCTACCTGACCAATCCCTTGGCCGAGT 
TACGTCGTGATGTATTCGGAGGACGACTTCGTGGCCGCGGAGCAGGAGAGCCCGGGGTCGCCGTGCCCGTTCTGGATGGACTGGTTAGCGAACCGGCTCA 



-pCB251 inserts U2 



-U2 ORF 



M Q H . Y I S L L L KHRRLVLSGPSGfGKTYLTNRLAE 



ACCTGGTGGAGCGCTCTGGCCGTGAGGTCACAGAGGGCATCGTCAGCACCTTCAACATGCACCAGCAGrCTTGC AAGGATCrGCAACTGTATCTTTCCAA 
TGGACCACCTCGCGAGACCGGCACTCCAGTGTCTCCCGTAGCAGTCGTGGAAGTTGTACGTGGTCGTCAGAACGTTCCTAGACGTTGACATAGAAAGGTT 



-pCB251 insert =U2 



•U20RF 



ylversgrevteg ivstfnmh 



QQSCKOLQLYLSM 



CCTAGCC AACCAGATAGACCGGGAAACAGGAATTGGGGATGTGCCCCTGGTGATTCTATTGGATGACCTGAGTGAAGCAGGC TCC A TCAGTGAGfTGGTC 
GGATCGGTTGGTCTATCTGGCCCTTTGTCCTTAACCCCTACACGGGGACCACTAAGATAACCTACTGGACTCACTTCGTCCGAGGrAGTCACTCAACCA^ 



-pCB251 insert ~ U2 



" " -U20RF . 

L , A N , Q ' 0 , R E T G \ G D V P L V 1 L L 0 D L S E A G S I S E L V 

AATGGGGCCCTC ACCTGCAAGTATCATAAATGTCCCTATATTATAGGTACCACCAATCAuCC TGTAAAAATGACACCCAACCATGGC ttgc a cttgagct 
rTACCCCGGGAGTGGACGTTCATAGTATTTACAGGGATATAATATCCATGGTGGrTAGTCGGACATTTTTACTGTGGGTTGGfACCGAACGTGAACrCGA 




NGALTCKYHKCP 



U2 0RF 

YIIGTTNOPVKMTPNHGLHL 
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fig55pCB251 (1 >6197) Site and Sequence 



Page $ 



TCAGGAT3TTGACCTTCTCCAACAACGTGGAGCCAGCCAATGGCTTCC TGGTTCGTTACCTGAGGAGGAAGCTGGTAGAG TC AGACAGCGACATCAATGC 
AGTCC TACAACTGGAAGAGGTTGTTGCACC TCGGTCGGTTACCGAAGGACCAAGCAATGGAC TCCT CCTTCGACCATCTC AGTCTGTCGCTGTA-"TT^r". 




-U2 ORF 



F R M L T F S N N V E P A N G F L V R y L R R K L v £ s D s p { ^ ^ 
CAACAAGGAAGAGCTGCTTCGGGTGCTCGACTGGGTACCCAAGCTGTGGTATCATCTCCACACCTTCCTTGAG^ 

gttgttccttctcgacgaagcccacgagctgacccatgggttcgacaccatagtagaggtgtggaaggaactc ttcgtgtcgtggagtctgaaggagtag 3ICC 



- pCS251 insert = U2 



-U20RF 



NKEELLRVLDVVPK 



L V Y H L H T FLEKHSTSOFL ! 



ggcccttgcttct ttctgtcgtgtcccattggcattgaggacttccggacctggttcattgacctgtgg^^ 
ccgggaacgaagaaagacagcacagggtaaccgtaactcctgaaggcctggaccaagtaactggacaccItgttgagatagtaagggatagatg 



-pCB251 insert = U2 



Gp CFFLSCPlGIED 



-U20RF 



f R T v FIDLVNNSI IPYLQE 



gagccaaggatgg gataaaggtccatggacagaaagctgcttgggaggacccagtggaatgggtccgggacacacttccctggccatcagccca 

CTCGGTTCCTACCCTATTTCCAGGrACCTGTCTTTCGACGAACCCTCCTGGGTCACCTTACCCAGGCCCTGTG TGAAGGGACCGGrAGTCGGGTTGTTCT 




GA ^DGIKVHGQK 



•U2 ORF 



A A V E 0 P VEWVRDTLPWPSAQOD 



CCAATCAAAGCTGTACCACCTGCCCCCACCC ACCGTGGGCCCTCACAGCATTGCCTCACCTCCCGAGGATAGGACAGTCAAAGACAGCACCCC AAGTTCT 
GOT TAG T TTCGACATGGTGGACGGGGGTGGGTGGC ACCCGGGAGTGTCGTAACGGAG TGGAftfinr Ten z.rrr rr.rr .'.rTTTr t/» v^r-T^^^Yj^. .... I 




OStCLyHLP 



-U20RF 



PPTVGPHSIASPPEOR 



K D 3 T P S 3 



CTGGACTCAGATCC TCTGATGGCCATGCrGCTGAAACTTCAAGAAGCTGCCAACTACATTGAGTCTCCAGATCGAGAAACCATCCTG 

GACCTGAGTCTAGGAGACTACCGGrACGACGACTTTGAAGTTCTTCGACGGTTGATGTAACTCAGAGGrCTAGCTCTTTGGTA^ X 



-pCB2S1 insert = U2 



LDSDPLMAM 



— U2 0RF 

LLKLQEAANYIES 



p 0 R E T ILDPML 



BNSDOCID: <WO 9824810A2J_> 



WO 98/24810 89/270 PCT/EP97/06956 



Tuesday, 18 November 1997 13:57 Page 6 
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aggcaaca c tttaagggttcggcaatc actgtcacccccggacagc agaacgctggcatc agctatcttagctcctcctc tcccctctcctctttc agag 
tccgttgtgaaattcccaagccgttagtgacagtgggggcctgtcgtcttgcgaccgtag tcgatagaatcgaggaggagaggggagaggagaaagtctc 



0. 



-pCB251 insert = U2 



U20RP 

0 a t l . gfgnhchprtaerwhols lllspllfq3 
cactggctctccagccccaggaggagaacaggagggaggaggagatgaaagaggagggacaggttcttggtgctgtacctttgagaacttcctaggaagu 

. 1 ■ 1 ■ 1 . 1 . 1 . 1 , 1 , , , , ^ 37i>: 

gtgaccgagaggtcggggtcctcctcttgtcctccctcctcctctactttctcctccctgtccaagaaccacgacatggaaactcttgaaggatccttcc 



-pC8251 inserts U2 



TGSPAPGGEQEGGGQERGGTGSVCC TF 



CMC 



AATGGTGGGGTGGCGTTTGGGAACTTGTGCCCCCTAAACACATTTACTGGCCTCCTCTAGAGGGCCCGTTTAAACCCGCTGATCAGCCTCGACTGTGCCT 

' — 1 ' 1 ' i ■ ■ I 1 I 1 1 ■ — ■ 1 ■ t 1 i t 1 1 — ii. t 

TTACCACCCCACCGCAAACCCTTGAACACGGGGGATTTGTGTAAATGACCGGAGGAGATC TCCCGGGCAAATT TGGGCGACTAGTCGGAGCTGACACGGA 

> , 

pCB251 insert = U2 1 

N G G V A F G N L CPLNTFTGLL RARLNPL ISLDCA 

TCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTG 

' 1 ' 1 : ' 1 ....I 1 I 1 ' i i I i i h -lO£i' 

AuATCAACGG TCGGTAGAC AACAAACGGGGAGGGGGC ACGGAAGGAAC TGGGACC TTCCACGGTGAGGG TGAC AGGAAAGGATTATTTTACTCCTT tAAC 

F - LPA IC CLP LPRAFLOPGRCHSHCPFL I K . G N C 

CATCGCATTGTCTGAGTAGGTGTCATTCTATTC TGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGACAATAGCAGGCATGCTGGGGA 
GTAGCGTAACAGACTCATCCACAGTAAGATAAGACCCCCCACCCCACCCCGTCCTGTCGTTCCCCCTCCTAACCCTTCTGTTATCGTCCGTACGACCCCT 
' A L S E . V SFYSGGWGGAGQQGGGLGRQ . Q A C V G 

TGCGGrGGGCTCTATGGCTTCTGAGGCSGAAAGAACCAGCTGGGGCTC TAGGGGGTATCCCCACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTG 
ACGCCACCCGAGATACCGAAGACTCCGCCTTTCTTGGTCGACCCCGAGATCCCCCATAGGGGTGCGCGGGACATCGCCGCGTAATTCGCGCCGCCCACAC 
C G G L Y G F GGKNQLGL . G V S P R A L R R IKPGGC 

GTGGTTACGCGCAGCGTGACCGCTACACrTGCCAGC GCCCTAGCGCCCGCTCCTTTCGCrTTCTTCCCTTCCTTTCTCGCCACGTTCGCCjGCrTTCCC: 
CACCAATGCGCGTCGCACTGGCGATGTGAACGGTCGCGGGATCGCGGGCGAGGAAAGCGAAAGAAGGGAAGGAAAGAGCGGTGCAAGCGGCCGAAAGGG^ 
GG YAQRQ RYT CQRPSARSFRFLPFLSRHVR3LSF 

G 7CAAGC TCTAAATCGGGGCATCCCT 7TAGGGTTCCGATTTAGTGC TTTACGGCAC C TCGACCCCAAAAAACTTGATTAGGGTGATGGTTC ACGTAGTG^ 
CAG7TCGAGATTTAGCCCCGTAGGGAAATCCCAAGGCTAAATCACGAAATGCCGTGGAGCTGGGGTTTTTTGAACTAATCCCACTACCAAGTGCATCACC 

3 S s K s GHPFRVPI CFTAPRPQKT LG . V F T Vf 
" ' » - ■■■<.. i . ■ , . i . . .. . i . , . i i . . 

GCC ATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTCC ACGTTCTTTAATAG tggac tcttgttccaaactggaacaac ac~c aaccctatc 
CGGTAGCGGGACTATCTGCCAAAAAGCGGGAAACTGCAACCTCAGGTGCAAGAAATTATCACCTGAGAACAAGGTTTGACCTTGTTGTGAGTTGGGATAG 
A IA L IDGFSP F 0 V G V H V L . . V T L V P N W N N T 0 P Y 

yGTATTCTTTTGATTTATAAGGGATTTTGGGGATT TCGGCCT AfTGGTTAAAAAArGAGCTGATTTAAC AAAAA TT TAACGCGAAT TAA T"fC TGTlj 
AGCC AGA TAAGAAAAC TAAAFATTCCC TAAAACCCCT AAAGCCGGA TAACC AA TTTTTTAC TCGAC TAAATTG TTTTTAAAT TGCGC TTaaT TAAGAC AC 
G L F F F 1R 0 F GDFGL LVKK . A D L T K ( R E '_ I L V: 
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GAATGTGTGTCAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAAA 

, . 1 1 — . — t ■ i i i ' * ' i i ■ i ■ t ■ . i titi ceo: 

CTTACACACAGTCAATCCCACACCTTTCAGGGGTCCGAGGGGTCCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAATCAGTCGTTGGTCCACACCTTT 
NVCQLGCGKSPGSPGRQKYAKHASQLVSNOVWK 

GTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACT 

, i-i - ' ■ 1 ' 1 ' ' ' i i i i i ^7CC 

CAGGGGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAATCAGTCGTTGGTATCAGGGCGGGGATTGAGGCGGGTAGGGCGGGGATTGA 

VPRLPSRQKYAKHASQLVSNHSPAPNSAHPAPN 
CCGCCCAGTTCCGCCCATTCTCCGCCCCATGGCTGACTAATTTTTTTTATTTATGCAGA6GCCGAGGCCGCCTCTGCCTCTGAGCTATT CCAGAAGTAGT 

I | i | ,.,..,)■■■,. , ■ | , ... i ... . i ... ..t .... I i ■ i i . » n . i I i i i i l ■ ■ - ■ .4 ■■■ ■ — "» ■ i i i I — — ■ ■« t i i i 1 4 SC. 

GGCGGGTCAAGGCGGGTAAGAGGCGGGGTACCGACTGATTAAAAAAAATAAATACGTCTCCGGCTCCGGCGGAGACGGAGACTCGATAAGGTCTTCATCA 
SAQFR PFSAPVLTNFFYLCRGRGRLCL . A ! P E V v 

GAGGAGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAAAGCTCCCGGGAGCTTGTATATCCATTTTCGGATCTGATCAAGAGACAGGATGAGGATCGTTTCG 

, ( i t 1 1 1 1 ■ l ill 1 i iii ■ ■ I ^3o: 

CTCCTCCGAAAAAACCTCCGGATCCGAAAACGTTTTTCGAGGGCCCTCGAACATATAGGTAAAAGCCTAGACTAGTTCTCTGTCCTACTCCTAGCAAAGC 

RRLFVRPRLLOKAPGSLY IHFR I . SRORMR IVS 

CATGATTGAACAAGATGGATTGCACGCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGAT 

, i ■ i i i i i ■ i i iii i t > ' * 1 ' * ■ ■ — *■ 50OC 

GTACTAACTTGTTCTACCTAACGTGCGTCCAAGAGGCCGGCGAACCCACCTCTCCGATAAGCCGATACTGACCCGTGTTGTCTGTTAGCCGACGAGACTA 

HD TRVI ARRFSGRLGGEAtRL LGTTONRLL 

GCCGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCGGTGCCCTGAATGAACTGC AGGACGAGGCAGCGCGGC 

1 ■ i i i i < — i i ' ■ ■ > i i i i i i i . . * i t 1 ■ ■ ■ 1- 510«;' 

CGGCGGC ACAAGGCCGACAGTCGCGTCCCCGCGGGCCAAGAAAAAC AGTTCTGGCTGGACAGGCCACGGGACTTACTTGACGTCC TGCTCCGTCGCGCCG 

CRRVPAVSAGAPGSFCODRPVRCPE . TAGRGSAA 

TATCGTGGCTGGCC ACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTGAAGCGGGAAGGGAC TG6C TGCTATTGGGCGAAGTGCCGGGGCA 

, 1 1 1 . — — \ (iii 1 i i i ' i • — >- 52C\ 

ATAGCACCGACCGGTGCTGCCCGCAAGGAACGCGTCGACACGAGCTGCAACAGTGACTTCGCCCTTCCCTGACCGACGATAACCCGCTTCACGGCCCCGT 

I VAGHDGRSLRSCARRCH S G K G L A A I GRSAGA 

' - ' • 1 - ■ ■ - 1 - - - ■ - I- J....-L I 1. - «■„- I I III i ■ 

GGATCTCCTGTCATCTCACCTTGCTCCTGCCGAGAAAGTATCCATCATGGCTGATGCAATGCGGCGGCTGC ATACGCTTGATCCGGCTACCTGCCC attc 

, i , 1 , — , — ( , \ . 1 — ■ i ■ . i . . . .i ■ i * ■ i i 5jo-: 

CCTAGAGGACAGTAGAGTGGAACGAGGACGGC TCTTTCATAGGTAGTACCGACTACGTTACGCCGCCGACG TA TGCGAAC TAGGCCGATGGACGGGTAAG 
GSPVtSPCSCRES IHHG . C N A A A A Y A . S G Y L P I 

GACCACC AAGCGAAAC ATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATC TGGACGAAGAGCATCAGGGGC TC GCGC 

1 1 1 | 1 H 1 . 1 1 1 |ii I ■ I 1 i 1 I ' ■ i ■ * 

CTGGTGGTTCGCTTTGTAGCGTAGCTCGCTCGTGCATGAGCC TACC TTCGGCCAGAACAGCTAGTCCTACTAGACCTGCTTCTCGTAGTCCCCGAGCGCG 
RPPSE TSHRASTYSOGSRSCRSG SGRRASGARA 

CAGCCGAACTGTTCGCCAGGCTCAAGGCGCGCATGCCCGACGGCGAGGATCTCGTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCATGGTGGAAA4 

i i i i i i ii ( i i i -i i i i i i i i i i 1- 1 ■ t l 

GTCGGCT TGACAAGCGGTCCGAGTTCCGCGCGTACGGGCTGCCGCTCCTAGAGCAGCACTGGGTACCGCTACGGACGAACGGCTTATAGTACCACC tttt 

3stvrqaqgaharrrgsrrdpwrcllae y h g g * 
tggccgcttttctggattcatcgactgtggccggctgggtgtggcggaccgctatcaggacatagcgttggctacccgtgatattgctgaagagcttgg: 

i i i 1 i.i I i 1 1 1 1 1 1— i — i ■ ' h 56o: 

accggcgaaaagacctaagtagctgacaccggccgacccacaccgcctggcgatagtcctgtatcgcaaccgatgggcactataacgacttctcgaaccg 

vplfvihrlupagcggplsghsvgyp . YC R A V 

GGCGAAT3GGCTGACCGCTTCCTCGTGCTTTACGGTATCGCCGCTCCCGATTCGCAGCGC ATCGCC TTCTATCGCCTTCTTGACGAGTTCTTCTGAGCG3 

■ 1 . 1 1 1 1 - — i i 1 . i . I 1 > • o 7 C'" 

CCGCT TACCCGACTGGCGAAGGAGCACGAAATGCCATAGCGGCGAGGGCT AAGCGTCGCGTAGCGGAAGATAGCGGAAGAAC TGCrCAAGAAGACTCGCC 

RRMG. PLPRALRYRRSRFAAHRLLSPS. R V L L S '■* 
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GAC TC TGGGGTTCGAAATGACCGACCAAGCGACGCCCAACCTGCCATCACGAGATTTCGATTCCA CCGCCGCCTTCTATGAAAGGTTGGGCTTCGGftATC 
CTGAGACCCCAAGCTTTACTGGCTGGTTCGCTGCGGGTTGGACGGTAGTGCTCTAAAGCfAAGGTGGCGGCGGAAGATACTTTCCAACCCGAAGCCTTAo ^ 
T > G F E M T D Q A T P N L PSROFDSTAAFYERLGFG I 

GTTTTCCGGGACGCCGGCTGGATGATCCTCCAGCGCGGGGATCTCATGCTGGAGTTCTTCGCCCACCCCAACTTGTTTAT TGCAG 

CAAAAGGCCCTGCGGCCGACCTACTAGGAGGTCGCGCCCCTAGAGTACGACCTCAAGAAGCGGGTGGGGTTGAACAAATAACGTCGAATATTACCAATGT ^ 
V F R 0 A G V M i L Q R G 0 L MLEFFAHPNLF IAAYNGY 

AATAAAGCAATAGCATCACAAATTTCAC AAATAAAGCATTTTTTTCAC TGCATTCTAGT TGTGGTTTGTCCAAACTCATCAATGTATCTTATCATG TCTG 
TTATTTCGTTATCGTAGTGTTTAAAGTGTTTATTTCGTAAAAAAAGTGACGTAAGATCAACACCAAACAGGTTTGAGTAGTTACATAGAATAGTACAGAC ° 00 * 
* • 5 N S I T N F T N K A F F S L H S 5 C G L 5KLINV3YHVC 

TATACCGTCGACCTCTAGCTAGAGCTTGGCGTAAfCATGGTCATAGCTGT 

ATATGGCAGCTGGAGATCGATCTCGAACCGCATTAGTACCAGTATCGACAAAGGACACACTTTAACAATAGGCGAGTGTTAAGGTGTGTTGTATGCTCGG 
1 P S , T S S . 5 L A . S V S . L F P V NCYPLTIPHNIRa 

GGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAG TGAGC TAAC TCACATTAATTGCGTTGC GCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGT 
CCTTCGTATTTCACATTTCGGACCCCACGGATTACTCACTCGATT6AGTGTAATTAACGCAACGCGAGTGACGGGCGAAAGGTCAGCCCTTTGGACAGCA 
G 3 ^ < C X ft V G A . . V S . L T L 1 A LRSLPAFOSGNLS 

GCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCrcrTCCGCTTCCTCGC 

CGGTCGACGTAATTACTTAGCCGGTTGCGCGCCCCTC TCCGCCAAACGCATAACCCGCGAGAAGGCGAAGGAGCGAGTGACTGAGCGAC GCGAGCCAGCA 
C Q . L ? • - I GQR AGRGGL R IGRSSASSLTOSLRSVV 

TCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAG AACATGTGAGCAAAAGGCCAG 
AGCCGACGCCGCTCGCCATAGTCGAGTGAGTTTCCGCCATTATGCCAATAGGTGTCTTAGTCCCCTATTGCGTCCTTTCTTGTACACTCGTTTTCCGGTC ^ 
R L R R A V S A H S K A V j R L S T E S G 0 N A G K N M . A K G 0 

CAAAAGGCCAGGAACCGTAAAAAGGCC6CGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCC TGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGG T 
GTTTTCCGGTCCTTGGCATTTTTCCGGC6CAACGACCGCAAAAAGGTATCCGAGGCGGGGGGACTGCTCGTAGTGTTTTTAGCTGCGAGTTCAGTCTCCA °° U 

Q K A R N R K K AALLAFFHRLRPP0EHHKNRRS5QP 
* "-* 11 ' ■ - * - ■ * ... 

GGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGC TCTCCTGTTCCGACCCT GCCGCTTACCGGATACCTGTZ 
CCGCTTTGGGCTGTCCTGATATTTCTATGGTCCGCAAAGGGGGACC TrCGAGGGAGCACGCGAGAGGACAAGGCTGGGAC GGCGAATGGCC TATGGAC AG 
V R N P f G L . R Y Q A F P P G S S L V R S PVPTLPLTGYL.3 

CGCCTTTCTCCC ^^^^^GAAGCGTGGCGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCT GTGTGCAC 
GCGGAAAGAGGGAA6CCCTTCGCACCGCGAAAGAGTTACGAGTGCGACATCCATAGAGTCAAGCCACATCCAGCAAGCGAGGTTCGACCCGACACACGTG ^ 
A FL P5G SVALSQC SRC RY L5SV. VVRSKLGCVH 

GAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCG CCAC TGGC AGCAGCCACTG 
CTTGGGGGGCAAGTCGGGCTGGCGACGCGGAATAGGCCATTGATAGCAGAACTCAGGTTGGGCCATTCTGTGCTGAATAGCGGTGACCGTCGTCGGTGAC ^ 
E P P V Q P D R C A L S G M Y R L ESNPVRHQLSPLAAAT 

G ^AAC AGGATTAGCAGAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGA CAGTATTTGGfATC TG 
CATTGTCCTAATCGTCTCGCTCCATACATCCGCCAC3ATGTCTCAAGAACTTCACCACCGGATTGATGCCGATGTGATCTTCCTGTCATAAACCATAGAC 
GN RISR ARYV G GA TEFLKWUPNYGYTRRTVF G1C 
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CGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGCAAACAAACCACCGCT GGTAGCGGTGGrTTTTTTGTTTGCAAGCAc 
GCGAGACGACTTCGGTCAATGGAAGCCTTTTTCTCAACCATCGAGAACTAGGCCGTTTGTTTGGTGGCGACCATCGCCACCAAAAAAACAAACGTTCGTC *^ 
A L L K P V T F G K RVGSS SGKQTTAGSGGFFVCKQ 

CAGATTACGCGC AGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGG TCTGACGCTCAGTGGAACGAAAAC TCACGTTAAGGGATTTTGG 
GTCTAATGCGCGTCTTTTTTTCCTAGAGITCTTCTAGGAAACTAGAAAAGATGCCCCAGACTGCGAGTC ACCTTGCTTTTGAG TGCAATTCCC TAAAACC 
Q | T R R K K G S Q S D P L IFSTGSOAQWNENSR . GIL 

TCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAA TCAATCTAAAGTATATATGAG TAAACTTGG TC TGACA3 
AGTAC TCTAATAGTTTTTCCTAGAAGTGGATCTAGGAAAATTTAATTTTTACTTCAAAArTTAGTTAGATTTCATATATACTCAT 7TGAACC AGAC TGTC ^ 
V W R L S K R I F T . | L L N.K.SFKSt.SlYE.TVSDS 

TTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCCATAGTTGCC TGACTCCCCGTCGTGTAGATAACTACGATACGG 
AATGGTTACGAATTAGTCACTCCGTGGATAGAGTCGCTAGACAGATAAAGCAAGTAGGTATCAACGGACTGAGGGGCAGCACATCTATTGATGCTATGCC 
Y Q C L I S E A P I SAICLFRSSIVA.LPVV ITTIR 

GAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGA TTTATCAGCAATAAACCAGCCAGCCGGAAGGGCCG 
CTCCCGAAT6GTAGACCGGGGTCACGACGTTACTATGGCGCTCTGGGTGCGAGTGGCCGAGGTCTAAATAG TCGTTATTTGGTCGGTCGGCCTTCCCGGC 
E G L P S G P S A A M [ P R 0 P RSPAPDLSAI NQPAGRA 

AGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCC AGTCTATTAAJ TGTTGCCGGGAAGCTAGAG TAAGTAGTTCGCCAGTTAATAGTTTGCGCAA 
TCGCGTCTTCACCAGGACGTTGAAATAGGCGGAGGTAGGTCAGATAATTAACAACGGCCCTTCGATCTCATTCATCAAGCGGTCAATTATCAAACGCGTT 
ERRSGPATLSAS I QS INCCREARVSSSPVNSLRN 



CGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGC TCGTCGTTTGG TATGGCTTCATTCAGCT CCGGTTCCCAACGArCAAGGCGAGfTACATGATCC 
GCAACAACGGTAACGATGTCCGTAGCACCACAGTGCGAGCAGCAAACCATACCGAAGTAAGTCGAGGCCAAGGGTTGCTAGTTCCGCTCAATGTACTAGG 
VVA lATGIVVSRSSFGMASFSSGSORSRRVT.S 



cccatgttgtgcaaaaaagcggttagctccttcggtcctccgatcgttgtcagaagtaagttggccgcagtgttatcactc atggttatggcagcactg 

GGGTACAACACGTTTTTTCGCCAATCGAGGAAGCCAGGAGGCTAGCAACAGTCTTCATTCAACCGGCGTCACAATAGTGAGTACCAATACCGTCGTGACG 
PML CKKAVSS FG PPIVVRSKLAAVLSLMVMAAL 

ATAATTC TCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGAC TGGTGAGTACTCAACCAAGTCATTC TGAGAATAGTGTATGCGGCGACCGAGTTG 
TATTAAG AGAATGACAGTACGGTAGGCATTCTACGAAAAGACAC TGACCACTCATGAGTTGG TTCAG TAAGAC tcttatcacatacgccgc tggctcaac 
H N . S L r v M P S V R C F S V T G E Y S TKSF.E.CMRRPSC 

ctcttgcccggcgtcaatacgggataataccgcgccacatagcagaactttaaaagtgctcatc attggaaaacgttcttcggggcgaa 
gagaacgggccgcagttatgccctattatggcgcggtgt atcgtcttgaaattttcacgagtagtaaccttttgcaagaagccccgcttttgagag ttcc 



scpasirdntaphsrtlkvli I 



gkrssgrklsr 



atcttaccgctgttgagatccagttcgatgtaacccactcgtgcacccaactgatcttcagcatc ttttactttcaccagcgtttctgggtgagcaaaaa 
tagaatggcgacaactctaggtcaagctacattgggtgagcacgtgggttgactagaagtcgtagaaaatgaaagtggtcgcaaagacccactcgttttt SJw '~ 

I L P L L R S S S M . P T R APN.SSASFTFTSVSG.AK 
CAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATAC TCTTCCTTTTTC AATATTATTGAAGC ATTTATCAGGG 

gtccttccgttttacggcgttttttcccttattcccgctgtgcctttacaacttatgagtatgagaaggaaaaagttataataacttcgtaaatagtccc 

T 3 R ? NAAKK- G IRATRKC IL ILFLFQYY . .3 I Y 0 (i 
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m TTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAATAAACAAATA6GGGTT CCGCGCACATTTCCCCGAAAAGTGCCACC TGACGTC 
AATAACAGAGTACTCGCCTATGTATAAACTTACATAAATC TTTTTATTTGTTTATCCCCAAGGCGCGTGTAAAGGGGCTTT TCACGGTGGACTGCAG 
Y C L M S G Y I F E C I . K N K Q I GVPRTFPRKVPPDV 
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10 20 30 W tO UO 70 

1 1 * ■ I i i ■ i I I i I i I ' i i t I i i i » t . . » , I .... I . ... I .... 1 . . f i . i * 1 . . . I ( ,.. t , a J_ 

AAGCTTGCATGCCrGCAGGAATTCGATATCAAGCTTATCGATACCGTCGACC; r CG . * G G A TC A G A AG A A A T~ /O 
i CjGAGCAACTACCCACATCCATTATGCCACCCGCGGTTTCTAAGTGAGTTTAATTTTGAGTT I'ACGAf* TA 100 
CAAAAATGTG1" I C I* I* !'AA fAAC I'A I C ! I'CGAC J I'GAG FC I'A I* rtTGTATGAC f AGT TG TTG AG TC A TT YT 2 10 
I'CAT rGAGAAAATATTAAAAGGAACATTATTTACTTTGCTTATTTGCCCTAACTTTGAr f TAG FTTTTT 2^0 
A rCAACTAGATCTTACAAAACTTGCAATACAATTCCATTTTCAGATTACCCTCGC^ACGTGTCGCCACGr 360 

3H0 370 380 390 TOO *M0 020 

» 1 < i i i ■ » I i t 1 1 i i i 1 1 I i i i i t i i i i i i . i i i i i i i i t ■ « i i > i . . i .... i ... . i , . . . i , , , . i 

CAGnAA(;C(K;TTCAGCAACTAACCCAAA~CCAACTTTCCACAAATGTCAACATCt;AC,Gr f TCAGAPTCr U'K) 
AC AG TC AAGAA7 A 7CG A A AATT6GTAAGAATTTTATT77GAGC7C A AAC TTG PA7AAAA7CCCCAG AA AA ^90 
CAAGA I oA I AAAAA TG TAG T TTTTT T GCAAAACTTCCACCTTTATTGCTC TAATA 'GACGGCTTATATCT SGO 
CAATTTTCTTGAGTTTTATCAAAAAATTTTCCACTATACAAATGTAGAAAAGTA P" T fGCACAAATTTTG 030 
ICAGTTGACAGC TTTGTAATAGA7CCAAATGGAACCTAGA7ACAAGC CGT TAAAG"GGAAGGACCGCAAG 700 

710 720 730 70 750 700 //() 

1 1 1 1 t i t i i t i i t i I i i ■ i I i i ■ i i i t t ■ I i 1 1 i » i t i ■ i i 1 1 i i . ... I , ... i ,,,, i .... i .... i 

TC I'A ' ACTGGAAATAATGATCTGAAAC AAATTTGTGCTATTCTCAAAi'G T TTAAG^CATGTTTTGAAGA T / /O 
T77TTCAAATTCGCAC f AG T TTC AG AA CC T~C CTT TTTGT ATG AA A AAGT AAA AAmAA AC TA7TTC A A AC Wirt 
CCTCACCGCCACCAfG rTTCAACTCTTAATTTTTATAAAATTTTCCAATTTACAAM ICGCC TCCCCT^GC 9*iO 
CCGAAAAGTGCCCACCAAAAf GAAP ITC IT G6CTTCATAATGACTTTTAAATTGA"GTGAGAAAAf; AC AG QflO 
^ASAGnPTAAPTAAA I' c f • a p a r. r*a a r f\ f; f - t tccc tc T T p TP pp T pp tt p tc C C ' * C C fCC TCCT7CCGT lOW 

1060 1070 1080 <090 1100 1110 1120 

1 1 1 1 I ' 1 ' ' I l l lllllllllltllll.il .... t .... I » I t * 1 . I t . | | | t ! . | 

TCCATCTCCAACAACAACAA I* I* *' TC IIAAT r TCGTTGTCCATTTTGCTTATAAACA"TfGTG TG TCC AAGG I 120 
AAACTACACGGGGAGACGGTCAA~AATTCGAATGAoAGCATGGCAATTAG KCr f CGGAA A 7 T G A 7 G A A 1 J 00 
I AAASAT AGAGCCGATGACACTG6C TGGTAGTAGTATGAG TG TAGAATTGCTTTT' CATCGTCTCAAC IT 1260 
GCGCA I'GAG f C 7 T C C C C C GC 7 C 7C A 7C AC 7 C AC A A T 7 A A 7G 7C GG G r TTTA TGCGC: T CTTTCCTATTCCG 1330 
CCAC rCATTCTGGGTTACCACAAACTGGAA I'ACA 'IT f i'AC rACTATTCAAGCCAT"TATTTTGA TA r TTA * 000 

t^io i4?o 1-130 ioao hgo ineo 14/0 

1 11 1 1 ''''I ■ l -Ul. 1 I I I L 1 I I I ■ I I I I I I I I I t I tltll I I I 1_ 1 III tl * I i J I . . .. I ...» 1 , ... 1 

ATTTTGTGC AATTAGGGATAAACACGACTT fTAAA AGTTTATTTA AAAAA ACG ATi. I* " TTCGATTTTAAA I 4/0 
AAATCTGAAAAG f" r rCAAAAAATCAA"AAATATTCCCTAACAAAT TGTATGGCTAaAATTTTA T 7 f C TAC ' SCO 
TG TTGACAA l*A TC !' T T A T A T G T ATC AC TG T TT TC C A TC TC A A A AC C T TGAATCCCC CAAGTTA TAGGAAG i 6 \0 
CTCCGTCTCACATTTCCCArGC TATGiATCCCTACTCAGCACATATCCAAAAATT/.AGCTAGACMTrGA 1680 
I A A TTA i TGGGCACGCGTAATAAAG rGCAAGCAGTTAGAATTTTAATTCAAGCACAGATTATCTA rCAAA I 750 
1760 1770 >780 1790 1IICX) iG 10 1820 

1 ' 1 1 1 1 1 1 1 1 1 ' I M I 1 I I M I I I I I 1 I t I I I I ) | l | I I l 1 I I I | | | | | | | | | -^UuLi.LLJ.Li_JULJlJ_ 

n'CAArCTTTGAACATiCAGCCAGTTCGVACAATr TTCCATGCTTTTTCGCCCAT? AAAAAACTTTCTpA Ul?0 
CC PCTTCATCCATCTCACTCG I'A f"C ATAAAAAG7ATAGCAAAAGCCCGAC TC rAC 1 TTTTAAGAGAAGGA 
(iATACTGAGCCACATGGCGTGTGACCC7TTTCArCTCG ICCG7TCGG7C7CAAAT1 CACGC rCATACTAA 1980 
CTCTTCAAArAGCCATAGACCTCCTTG7-TTr-TTCTrCGTTTTGACTCGCC.CC l*A I TTTTTGTGGC TGCC ^Ol'iC 
re^AAGCCGSGAAAArTTAGTATAT'TA'GAGCTTA^CTTrATGCAATACATAAA/ AACGAGGCAATTTA 210C 
2110 ?]?0 2130 21'I0 2150 2V5G 2170 

I 1 I I I 1 1 I I I LU LL I I I I I I ■ ... I .... t , ... I .... I .... I .... I .... I ... I .... I .... I 

AAA A 7 A f 7AAAATTAATGAGG7TG rAGAT5"AGATTTGGAAAACAAGAAAAAAAC/ AAACAAATAGGAAC ? I /O 
to vCAGATCAAAA !" I"C I'A rTTAAAGC!"TTTCAAGAT6TT TAGGCAAGATTCGGCTC AACAGAAAACTGAA 22-0 
!» T vCC i CCA f AAA j'C l AGTCTAACGT'TAGATTGAACTCGGAAATCCTAAGCCTG/ ACTATAGCr l" I'A rr 23 m 
v T AGA rCTTAGTTGCCC A TAAGCTC AA'oCCCAACC AGAAA FGACTTGC AT TTAC T l I AAGCC f AGATTGA 23<"!0 
C riGfrrGCTTCAGTCTAATCCAGACrAGATTTCCAAGAGAGrTTTCAATTTTAA/ l*G » rrcCA^TTTCT 2450 
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2460 2470 2480 2*190 2600 25 ?0 7570 

■ ....I..**! it.ii ■ .>it ....I , , . . i . ... t .... i .... i . . > . i ....i. , > , i . . . . i 

TGTTACTTAAAA7CTTAATGCCC FG FGATGCGTAAAATCG7TATCCCTTTC7CTCACACTTTC AATTACA 2520 

GA r i'CA TCAAAGA I* FGGTATCAAGCCAAAG ACGTCTGGAC \ f AAACCACCC FC AIC ATCAACCAC f f C A I" 2590 

CAAATAATACAAATTCATTCCQTCCGTCGAGCCG F FCGAG TGGCAATAATAATGTTGGC7CGACGA T ATC 2b<50 

CACATCTGCCAAGAGCT7A3GTATCCGATCCTTCCGGCTTCTTTT7AGAAAT FA TA T FA FT rCAGAA TCA 5730 

TCATCAACGTACAGCTCTATTTCGAArCTAAACCGACCTACCTCCCAACTCCAAAAACCTTCTAGACCAC 2S00 

2810 2820 2830 28*10 2850 2880 2870 
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AAACCCAGCTAGTTCGTGTTGCTACAACTACAAAAATCGGAAGC TCAAAGC I'AGAGGA I'CCCCGGGA I' \'G 2870 
GCC AAAG6ACCCAAAGG I A I G I I I CGAA I GA ! AC ! AACA I AACA I AGAAC A I I I ICAUGAIKJACLC i i IM 'Am) 
AGGGTACCGGTAGAAAAAATGAGTAAAGGAGAAGAACTTTFCAC rCGAGT TG FCCCAAF FC I'i'G ! CGAA i 3010 
TAGATGGTGATGTTAATGGGCACAAATTTTCTGTCAKTGGAGAGGG TGAAGG FGA IT.CAAC A TACGG AAA 3000 
ACTTACCCTTAAATTTATTTGCACTACTGGAAAACTACCTGTTCCATGGGTAAGTTTAAACATATATATA 3 1 GO 
3 ISO 3170 3100 3190 3200 3210 3220 
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CT AACTAACCCTGATTATTTAAATTTTCAGCC AACACTTG TCACTACTTTCTGTTA TGGTG TTCAATGOT 3??0 
■C I'CGAGA FACCCAGA FCA FA f GAAACGGC ATSAC7TTTTCAAGAGTGCC ATGCCCGAAGG TTATGTACA HVUO 
GGAAAGAACTATAT77TTCAAAGATGAC6GGAACTACAAGACACG FAAGf f TAAACAGT FCGG I'AC fAAC 3360 
lAACCATACATAT r rAAATrrrCAGGrGCrGAAGTCAAGTTTGAAGGTGATACCCTrGTTAArAGAAFCG 3430 
AG TTAAAAGG FA !TGAT T r T A A AG A A G A T G G A A A C AT T C T 7 C : G AC A C A A A T T G G A A T AC A A C 7 AT A A C TC 3500 

3610 3620 3630 3540 3650 3580 3570 
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AC ACAATGTATACA7CAT3GCAGACAAACAAAAGAAT-GGAA TCAAAG T FG TAAG F I' FAAAC I' TGGAC r FA 3570 
C ! A AC FAACGGA f FA FA r F FAAA F !* ■* rCAGAACTTCAAAATTAGACACAACATTGAAGATGG AAGCGTTC 3640 
AACTAGCAGACCATTATCAACAAAA'ACTCCAATTGGCGATGGCCCTGTCCTTTTACCAGACAACCATTA 3710 
CCTGTCCACACAATCTGCCCTr rCGAAAGA 'CCCA ACGAAAAGACAGACC ACATGCTCCTTCTTGAGTTT 3/80 
uTAACAGCTGCTGGGATTACAC ATGGCATGGATGAAC FA FACAAA TAGCATTCGTAG AATTCCAAC TGAG 3860 

3860 3870 3800 3890 3900 3910 3920 
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CGCCGGTCGCTACCA ITACCAAC ! FG FCTGGTGTC AAAAATAATAGGGGCCGC7GTCATCAG4G7AAGT7 3fiJ>0 
TAAACTGAGTTCTACTA ACTAACGAGTAATAT FTAAA F F F I'CAGCAVC I'CGCGCCCG FGCC FC l"G AC F I'G 3390 
I.W, f'CCAA r TAG FG F f'CAACATCCCTAGATGCTCTTTCTCCCT'aTGCTw CviACw »CG TAT F F f JT> T f*A T '10*0 
TATCAAAAAAACTTCTTCTTAATTTCTTT6 FT FT f FAGCT7C~TTTAAG7CACC7CTAACAATGAAA7 r G 4100 
PjTAGATTCAAAAArAGAArTAATTCGTAATAAAAAGTCGAAAAAAArTGTGCTCCCTCCCCCCATTAAT 4?00 
4210 4220 «>X 4240 4250 4280 U?70 
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AATAATTCTATCCCAAAATC I'ACACAATGTTCTGTGTACACTTCT T ATG F F r i* ITT TACTTC FGATAAAT 42/0 
F F r r F F i GAAAC A TC AT AG A AAAAACCGCACACAAAATACCTTATC AT ATG 77 ACGTTTCAGT7TA7G AC <13<10 
C3CAATTTTTA"TTCTTCGCACGTC FGGGCCTCTC ATGACGTCAAATCA FGC i CA FCG TGAAAAAGTTTT 44 !0 
'oGAGTATTTTTGGAANTI t'CAATC AAGTG AAAGTTTATGAAATTAATTTTCCTGCTTTTGCTTTTTGGG 4480 
fiii'TTTCCCCTATTGTTTGTCAAGAGrr l'CGAGGACGGCG"TT T TC TTGCTAAAA TC ACAAG I'AlTGAfGA 4fi«) 

4660 45 70 4580 M690 4800 4610 4620 

■ ■ * ■ 1 ■ ■ > ■ ' ■ * * « < ■ » » . I .... * .... I .... I ...» I ■ ■ i ■ I .ml i t i i 1 i i i i I i « t t I i till 
GCACGATGCAAGAAACATCGGAAGAAGGTTTGGGTTTGAGGCTCAGTGGAAG6 I'G AG F AG A A G T T d A T A A WAQ 
TTTGAAAGTGGAGTAGTG r C TATGGGGTTT'TGCCTTAAATGACAGAA I'ACAT FCCCAATA TACCAAACA ';C^0 

lAAC FG 1TICC'rAC"AGTCGGCCGTACGG3CCCrrTCGTCTCGCGCGTTTCGGTGATGAC3GTGAAAACC 4/60 

T G AC AC A T GC AG C TC C C G G AG AC GG A C A T TG T C T G ~ A AG C GG A F GC C G GC AG C AG A ^ A AG C C CG 4830 
TCAGGGCGCGTCASCGGG FG V FGGCCGGTG T"GGGGCTGGC •* F AA C T A TG C G C C A 7 C AG AG C A G A * r TG FA 4900 
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CXAGAGTGCACCA PA FGCGG TG T GAAATACCGCAOAGATGCGTAAGGAGAAAATACCGCATCAGGCGGC 4*-) A) 
C: i'AAGGGCC IX G J*GA i'ACGCC TAT^TTTATAGGT TAATC TCATGATAATAATGGTTTCTTAGACGTCAG K04C) 
G T GGCAC TTTfCGGGCiAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAA TACATTr AAATATGTA f>1 \0 
TCCGCTCATGAGACAArAACCCTGATAAATG^TTCAATAATATTGAAAAAOGAAGAGTATGAGTATTCAA 
CATTTCCGTG TCGCCC ( I A f I CCC T TrTT-GCGGCATTTTGCCTTCCTGTTTTTGCTCACCCAGAAACGC b2<iO 
5260 5270 5280 529C 5300 53 tC 5320 
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I'GG TGAAAG TAAAAGATGC r GAAGA IX AG f 1 6GG : GCACGAGTGGGTTACATCGAAC I'GGA I C CCA AC AG 5370 
CGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTTTAAAGTTCTGCTA bU90 
TCTCGCCCGCTATTATCCCGTATTCACGCCGGGCAAGAGCAACTCGGTCGCCGCATACACTATTrTCAGA £W\ 
ATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAGrAAGAGAATTATG SS30 
CAG IXC I'GCCATAACCA I GAG f GA : 'AACACTGCGGCCAACT7ACTTCTGACAACGATCGGAGGACCGAAG FSCO 

5610 5520 £630 tobUO bfcbO bS60 hH K) 
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GAGCrAACCGCTTTrTTGCACAACArGGGGGATCATGTAACrCGCCTTGATCGTTGGGAACCGGAfJCrGA G670 
A ."oAAGCCATACCAAACGACGAGCG TGACACCACGATGCC r G TAGCAA r GGCAACAACG TTGCGCAAACT 57'10 
A? rAACTGGCGAAC TAC7TACTCTAGC 7TCCCGGC AACAAT TAA TAGACTGGATGft AGGCG'jATAAAGTT 58 1C 
C.CAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCTGGAGCCGGTGAGC hHBO 

0 » GGG rc rCGCGG FATCA T TGCAGCAC XGGGCCAGA IXG I'AAGCCC f'CCCG r A TCG TAG f TA TC TACAC G9GC 

6960 59/0 5960 5990 SCOO 60 10 8020 
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o A C G GG G AG T C A G GC A A C T A T3 G A 7 G A AC 6 AAA Tj\ G AC AG ATC GC TG AG A T AG G TG C C TC AC To AT T A AG 6020 
Cm I I'GG '"AACTG IXAGACCAAG r •* !"AC I'CA ! *A TATACTTTAGATTGATTTAAAACT TC ATTTTTAA f V I' A 60CO 
AAAGGA IX! AGG I GAAGA I CC ITT T TGATAA7CTC ATC ACCAAAATCCC T TAACGTGAGTTTTCCTTCCA fcilfiO 
CTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCT T GAGATCCTTTTTTTCTGCGCCTAATCTGC G?30 
T GCTTGC AAACA AAAAAACCACCGC TACCAGCGGTGGTTTGTTTGCC3GATCAAGAGC 7 AC C A AC PC f T T 6300 

6310 6320 6330 6340 6360 K:JB0 63 K) 
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'XCGAAGG I A AC VGGCTTCAGCAG AGCGCAGATACCAAATACTGTCCTTCTAG TG TAGCCu f AG !" i'AGG 6370 
C C A C C AC T T C A AG A AC TC TG TAG C A C C GC C 7 A C AT ACC IXGC TC I'GC i'AA IXC fGT TACCAGTGGCTGCT H4<U> 
GC C A G TG C C GAT A AG TC G TG TC T 7 AC C GCi G r TG G A C f C A A G AC G A TAG TT A C C GC A T A AGG C G CAG C G G T GG:0 
CGGGCTGAACGGGGGGTTCGTSCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGArACCr fififlO 
AC AGCGTCAGCATTGAGAAAGCGCCAC^CrrCCCGAAGGGAGAAAGGCGGACAGGT A rCCGG TAAGCGoC 865C 
666C 6670 66flO flfiflO 8700 6/10 fc>/20 
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AG G G 7 C G G A AC A G G A G A G C G C A C C A : j G G AG C T T C C AGG G G G A A AC G CC FGG I A ICY TTATAGTCCTGTCC '6720 
GG T T PC G C C AC C T C T G A C T T C AG C G "CG A V T T TG TGA TG C TC G T C AG GG G G G C GG AG C C T A T GG a\ A A A A ti /90 
CGCC AGCAACGCG6CCTTTT TACGG "TCC7GGCCTTTTGC rGGCCTTTTGCTCACATGTTCTTTCCTGCG 6060 
TT ATCCCCTGAT PC I'G ! GGA TAACCG T ATTAC"GCC \ I* rGAGTGAGCTGATACCGC TCGCCGCA6CCGAA 8930 
C?ACCGAGCGCAGCGAGTCAGreA(K:GACn1AAGCGGAAGAGCGCCCAATACGCAAACCGCCTC' r CCCCt:.C 7000 

7010 7020 7030 /0«0 7050 7060 7070 
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OCCrroOCCOATTCAr rAATQCAOCTOGCACQACAGCTTTCCCCACrCGAAAGCGCCCACTCACCCCAAC 7070 
GCAAT T AATG TGAG I I'AGCTCACTC A T ~AGGCACCCCAGGCTTTACAC T 1" f ATGCT TCCGGCTCGTATGT 7140 
i'G rGTGGAATTGTGAGCGGA r A AC A A T T "C AC AC A GG A A A C A(i CT A T G AC C A T G A " T A C GC C A A G C TG TA />! 10 
AGTTTAAACATGATCTTACI'AACrAACTATTCTCA r r I'AA ATTTTCAGAGC7T AAA AATGGC TGAAA TCA 
C T C AC AACG A TG G A T ACG C T AAC A A ^ i" i GGAAATGAAAT V319 
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AniACCA IGAI'l ACGCCAA3CTTr3CA7GCCTGCA66AATTCGATATCAAGCTTATCGATACCGTCGACCT /O 
.': C, AG G A T C A C A A G A A AT T GG AC C A A C ~ AC. C C A C A T C C A F f AI'GCCACCCGCGG I" I* I'C I'AAG FGAG FT FAA 1 MO 
FTTTGAo TT FACGAC FACAAAAATGTG ~TCTTTAATAACTATCT T CGACTTGAGTCTATTCTG TAT G ACT 
Ar.TTGTTGAGTGATTTTTCATTGAGAAAATATTAAAAGGAACATTATrTACTTTGC rTATTTGCCCTAAC 28(3 
TTTGATTTACiTTTTTCGATCAACTAGA"CTTACAAAACTTGCAATACAATTCCATTTTCAGATTACCCT"C 3fiO 

360 370 360 300 <400 4!0 420 
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:CCACGTCTCGCCACGTCAGCAACCGCTTCAGCAACI'AAGCCAAA rrCCAAr r rrCC:ACAAAf GI'CAACA t\?0 
TCCAGGC TTCACACTCCACAGTCAAGA ATATCGAAAAT . r GCiTAA6 AA I' T f r A r ! r I'KAGC rOAAAC ! f'G r <!90 
^TAAAArGCCCAGAAAAGAAGATGATAAAAATGTAGTTTTTTTGCAAAACTTCCACCTTTATTGCTCTAA 660 
7 A T G AC G G C T TA 7 ATC 7C A A TT T T C TT :] A3 TT T 7 A TC A A A AAA T r F rcXAC FA F AC AAA TO I AGAAAAG F 6HO 
A - ! I i GC AC AAA I !;TGrCAGF I' G AC A GC T T T G 7 A AT AG A TC C A A A TG C A A C C TAG A T AC A A G C T G T T A A /CO 

710 720 730 740 7GQ 7G0 770 
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AG i G G A A G G AG C G C A AG i'C ! AT AC I GG AAA I'AATGATCTGAAACAAATTTGTGCTATTCTC AAA7GTTTA //() 
AGACATGTrTTGAAGATTTTTrCAAATTCGCACTAGTTTCAGAACCTTCCTTTTTCTATOAAAAAGTAAA iVAO 
•* A A A A AC T A T T T C A A AC C C T C AC C S C C AC C A T G T T fCAACTCTTAATTTTTATAAA A7TT7GC AATTTAC 910 
4AA i'CGCC IXCCC rTGCCCGAAAAG^GCCCACCAAAATCAATTTCTCGGCTTCATAArGAC r r r FA A A f~T flflO 
:a i*G rGAGAAAACACAGAAGAGGCTAACTAAATTGACAGGGACAiiGTTSTCCCTC: I* IX TCCC IXC r IX I f: 1050 

•060 107C 1080 1090 1 100 1 1 10 1120 
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C C G C C TC C T C C T C C C G r T C C A T C TC C A AC A AC A AC A A T TT TC C AA T TTCG T TG TCC A r T f T G C TA ("AAA 1 l?0 

JA 1 I I'Ci rGrGKKjAAGGAAACTACACGoGSAGACGGTCAATTAATTCGAArGAKACiC'A I'GKCAA .*' I* AC I'C 1 130 

T T T C GG A A A T TG ATG A A T A A AG A TAG A :X G A IX A C AC TG G C T GG T AG T A G T A T G A o T G T A G \ AT T G C 7 T 1260 

T 7 TC ATC G T C TC A AC T TG C G C A TG AG ' " N C C CC C GC 7 C T C AT C A C TG AC A AT T A A TG TC G G G T T T T A TG 1330 

CCCTCT7TCCTA' r XCGCCACTC Al CTGSGTTACCAC AAACTGGAA r AC A i I' IT AC f ACTA F I'CAAGCC 1400 

1410 1420 1430 1W3 1400 V!00 1'I70 

i - » ■ i t ■ t i t . . > . i t . . i I .... i .I :tl ....I ....I ....I , ... 1 .... i . i i . I i .till t i t 1 

AT TTATTTTGATA7 FTAA F I" iTGTGCA^TTAGGGATAAAC ACGAC TT r FAAAAG7TTA fTTAAAAAAACC 14/0 

ATATTTTCGaTT"TAAAAAA CC I'GAAAAoTTTCAAAAA A T CAA TA AA' r A T JXCC ! AACAAATTGTATGGC 1o40 

l AAAAI I I :'A 1* I' TCTACTGTTGACAA"ATCTT TA TA I G TATCAC7GT7TTCCATCTCAAAACC TTGAA r C 1610 

CCCCAAGTTATAGCAACCTCCGTGTCACATTTCCCATGCTArGAArCGCTACTCAGCACATATCCAAAAA 1600 

TTAAGCTAGACT:GTTGATAATrAr rGGCiCACGCGTAATAAAGnXAAGCAr. I' I'AGAATTTTAATTCAAGC 1 75C 

1 760 1 770 ' 700 1 /90 1 8CC 1 fl 1 0 1 S?0 
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ac agattatctatcaaattcaa 'tc r r icaacattcacccagttcg :acaa r r r iccatgctttttggccc i 320 

ATTAAAAAACTT re: !*CACC rCT"CATCCATC"CACTCG I'A I'C A TAAAAAG TAT ACC AAAAGCC CGAC I'C I 1090 
AC TTTTT AAGAOAAGGAGATAC*GAGCCACA*"GGCG ' G TGACCCT T TTCATCTCGTCCG I* TCGG I'C TC AA VJUC 
ATTCACGCTCATACI AACTCTTCAAATAGCCATAGACCTCCTTGTTTTCrrCrTCCTTrTGACTCGCGCC 3030 

■a r r r r r ig tggctgcctgaaagccgggaaaatttagt atatttatgagc r i'atctttatgc aatacata 2 :co 

7W0 '2V20 2*30 21U0 7M> 2\00 2170 
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AAAAACGAG3CAA I TAAAAATA"TAAAAT r A A Yd AGG TT G T AG A TG r AG A ! I IGG AAAAGAAGAAAAAA 2 ! /0 
a.; AAAACAAATAGGAACCGCCAGA rCAAAATTCTATT I'AAAGG f rTTCAAGATGTTTAGGC AAGATTCGG i!V:40 
C ; i A AC A C A AA A C T G A A G T G C C IGC A rAAATCTAGTGTAACGTTTAGATTGAACTCGG AAA TCC TAAGCC '33 ' 0 
TG AACTATAGCCT FA i" ■ C I'AGATCT'AGT'GCGCA 1'AAGCTCAAGCCCAAGCAGAA ATGAC TTGCA I" I" I'A 2380 
G : " I AAGCC IAC A T 7 G AC T 7 CC T FGC " I'CAG rc.TAATCCAGACTAGATTTCCAAGAGAC' I ! 1 I CAA MM 2*'! f 30 
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AAATGTTTCCAGTTTC I" T (i F I' A C 1' T' A A A A T C T T A A TCi C C C TG T G A TG C G T A A A A TC G T T A T C C C T T T C TC 2W9.Q 

icacac i i rcAA ttacag attc atcaaagattggtatcaagcc aaagacg rc tggac r j aaaccaccc rc 2590 

ATCATCAACCACTTCATCAAATAAi ACAAA \' I CATTCCGTCCGTCGAGCCCTTCGAGTGGC AATAATAAT 2b(50 
CTTGCCTCGACGATATCCACATCTGCGAAGAGCTTAGGTATCCGATCCTTCCGGCTTC7TTTTAGAAATT :> /30 
ATATTATTTCAGAATCATCATC AACGTACAGC f C T ATT TC G A A TC T A A AC C G ACC T AC CTC C C A AC TC. C A 70OO 

2810 2820 2800 20^0 2860 2860 2070 
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AAAACC T TC l AGACCACAAACCCAGCTAoTTCGTGTTGCTACAACTACAAAAATCGGAAGCTCAAAGCTA 2U/0 
CCCGCTCCGAAAGCCGTGAGjCACCCCAAAACTTGCTTC rGTGAAGAC TAT FGGAGC AAAAC AAGAGCCCG 2S40 
ATAACAGCGGTGGTGGTGGTGGTGGAATGCTGAAATTAAAGTrATTCAGTAGCAAAAACCC A rCTTC.i: PC 30 iO 
A r C G A A T AG CC C AC A AC C T AC G A G A A AGG C G (U*GG C GG TG C C T C A AC A AC A A A C T T TG TC G A A A A TC G C f HOflO 
OCCCCAGTG AAA AGT GGCC TGA AG CCCCCG ACC AGTAAGCTGGG A AGTGCC AC GTC TATGTCG AAGC TTT 3150 

3160 3170 3180 3190 3200 32 10 3220 
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GTACGCC AAAAGTTTCCTACCGTAAAACGGACGCCCCAATCATATCTCAACAAGAC TCGAAACGATGCTC XWQ 
AAAGAGCAGTGAAGAAGAGTCCGGATACGCTGCATTCAAC AGCACGTCGCCAACGTCATCATCGACGCAA 3290 
GC TTCCC TAAGCA';*GCATTCCACATC T TCC AAGAGTTCAACGTCAGACGAAAA6TC TCCGTCATCAGACG 33G0 
ATCTTACTCTTAACGCCTCCATCGTGACAGCTATCAGACACCCGATAGCCGCAACACCGGTTTCTCCAAA 3430 
PA T T ATC AAC AACC C TCT TG AG G A AAA ACC AAC AC J'GCaCAG I'GAAAGGAG TGAAAAGC ACAGCGAAAAAA 3500 
3010 3020 3G30 3640 3550 3560 3570 
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GArCCACCTCCAGCTGTTCCGCCACG^GACACCCAGCCAACAATCGGAGTTGTTAGTCCAATTATGGCAC 35/0 
A T A A G £ A G T TG A C A A A T G AC CC C G TG A T A T C TG A A A A A CC AG A AC C TG A A A AG C TC C A ATC A A TGAGCAT 3640 
0 G A C AC G AC G G A C G T T C C A C CG C T T C C A C C TC r A A A A T C A S T T G T TCC AC T TA A A A TG AC T T C A A TC C OA 3/10 
CAACCACCAACG T ACGATGTTC TTCTAAAACAAGGAAAAATCACATCGCC T G7CAAGTCGTTTGGATATG 
ASCAGTCGTCCQCGTCTGAAGACrCCATTCiTGGCTCATCSrXiTCGCiCTCAaGTGACTCCGCCGACAAAAAC 3850 

3860 3070 3880 3890 39C0 3010 WYM 
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TTC TG3TAATCATTCGCTGGAG A11AAGKA I'GCiG AAAGAAT AAGACATC AG AATCCAGCGGC TACACCTCT 3S2C 
GACGCCGG rGT I'GCGA i"G I'GCGCCAAAATGAGGGAGAACCTGAAAGAATACGATGACA i'GAC I CG fCGAG 399C 
C A C AG A A CG GC T A T C C TG AC AA C T TCG AAG AC AG T rCCTCCTTGTCGTCTGGAATATCCGATAACAACCA 4060 
GC TCG AC GAC AT ATCCACGG AC G ATT rGTCCGGAGTAGACATGGCAAC AGTCGCC I CCAAACA t'AGCGAC 4 130 
TATTCCC AC TTTGT rCCCCA 1CCCACG7CTTC TTCCTCAAA6CCCCGAGT CCCCAu I'CGGTCC TCCACAT 4200 
4210 0220 4230 4260 4260 4270 
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CAGTCGATTCTCGATCTCGAGCAGAAC AGG AGAATGTGTACAAAC~TCTQTCCCAG TGCCGAACGAGCCA 4270 
ACGl'GGCGCCGCTGCCACCTCAACCTTCGGACAACATfCGCTAAGATCCCCGGGATACTCATCCTATTCr '^340 
CCACACrrArCAGTGTCAGCTnATAAGGACACAATGTCTA T CCACTCACAGACTAGrCGACGACCI !CI ' W-Q 

cac a aaaaccaagc tat~caggccaatttc a r ccacttgatcgtaaatgcc accttcaagag r tcaca rc 44«o 

C A C C Ci AG C A C AG A A T GO. C G G C l*C rc TTGAGCCCGAGACGGG "GCCGAAC TCGATGTCG AAATATGA « TCT 4560 
4560 4570 ilftSO **690 4600 4810 46?0 
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TC AGGATCC TACTCGGCGCC- 1" i'C£C<SA6<Sri<SAA6CTCTACTflfiTA TC l'A I'GGAGACACa I ICCAAC I CC 'WJ) 
AC AGACTATCCGA ITj AAAAATCCCCCGCAC AT fCTGCCAAAAGTGAGATGGGA TCCCAAC TA TCACTGGC 4*590 
TAGCACGACAGCATATGGAICI'C I'C AATGAGAAGTACGAACATGC TATTCGGG ACA TGGCACG TCACTTG '1760 
GAGTGTTACAAGAArjAC fG r CG AC T C A C T A AC C A A G A A AC AG G AG AAC T A 7G G AG C A T TG T T I' G A I'C I i I 4030 
I !GAGCAAAAGCTTAGAAAACrCACTCAA:;ACA'r ^GATCG ATCCA ACTTGAAGCC':" GAAGAGGC AA FACG UOOO 
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A r rCAGGCAGGACATTCCTCATTTGAGGGATATTAGCAATCATCTTGCATCCAACTCAGCTC/^TCCTAAr: 49/0 
GAAGGCGCTGGTGAGCTTCTTCGTCAACCA IX fCTGOAATCAGTTGCATCCCATCGATCATCGATGTCAT 5040 
f <i !" C G TC G A A A A G C A GC A AG C A GG AG A AG A TC A6C TTGAGC rOG !' T f'GGC AAGAAC AAGAAGAGC f'GGA I 5 ! 10 
CCHCTCCTCACTCTCCAAG I' I C ACCAAGAAGAAGAACAAGAACTACGACGAAGCACATATGCCA rCAAT 5180 
rCCGGATCTCAAGGAAC IX I' l*G ACAAC4TTGATGTGA7TGAGTTGAAGCAAGAGC i'CAAAGAACGCGA iA 5250 

6260 5270 5280 5290 5000 5310 6320 
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GTGCACTTTACGAAG TCCGCC I" fGACAATC TGGATCGTGCCCGCGAAGTTGATG f I'C PGAGGGAGACAG f 5320 
t:AAvAAta I I liUAAAU.t- AMWU. AAl,L AATTAAAGAAAGAAGTGGACAAAC TCACCAACSG rCCAGCCAC ! 339Q 
CG TGCTTCTTCCGGCGCC'TC AATTCCACTTATCTACGACG ATGAGCATGTCTATGATGCAGCGTGTAGCA 5U60 
GTACATCAGCTAGTCAATCTTCGAAACoArCCTCTGGCTGCAACTCAATCAACCTTACTGTAAACGTGGA 5530 
CA I CGC I'GGAGAAATCAGTTCGATCGTTAACGGGGACTTG AAGCAGCAGG AArTCTTCCTGGGCTGTAGC httX) 

5610 5620 5630 5640 5650 5660 bB/O 
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AAGG f'CAG TGGAAAAG i" I'GACTGGAAGATGCTGGATGAAGCTGTTTTCCAAGTG f rCAAGGAC TA I A f f i' 5670 
C i AAAATGGACCCAGCCTCTACCCTCGGAC 7AAGCAC TGAG I'CCA l"CCA TGGCTACAGCATC AGCC ACG r B/40 
GAAACGAGTGTTGGATGCACACCCCCCCGAGATGCCrCC rrG<;CG I'CGAGGl'G I'CAATAACATATC AG TC BU10 
TCCCTCAAAGGTCTGAA6GAGAAA TGCGTC GACAG CCTGG TGTTCGAGACGCTGATCCCCAAGCC6A I'GA 5880 
r*3 C AGO AC T AC A T AAGC C ICC I GC I'GA^GC ACCGGC6CCTCGTCCTCTCGGGCCCC AGCGGCACGGGCAA 5960 
6960 5970 6900 5990 6000 60 1 0 6020 
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GACCTACCTGACCAATCGCTTGGCCGAGTACCTGGTKGAGCGCTCTGGCCGTGAGSTCACAGAGGGCATC 6020 
G fCAGCACC f rCAACArSCACCAGCAGTCTTGCAAGGATCTGCAACTGTATCTT I'CCAACC I AGCCAACC 6000 
A G A T AG A C C G G G AAA C A G G A AT T GG GG A Tfi rGCCCCTGGTGATTCTAfTGQATGACC TGAG rSAAGCAGG 6160 
CTCCATCAGTCAGTTGGTCAATGGGGCCC rCACCTGCAAGTA T CA T AAATGTCCCTA TA T T A fAGG !"ACC 6200 
ACCAATCAGCC FG FAAAAATGACACCCAACCATGGCT FGCAC i fG A GC T T C AG G AT G T TG A C C TTC T C C A 6:300 

6310 6320 s:KJO 6340 6350 Q3GC B^'O 

' ' ■ « I ■ I * * I l ... I 1 . . » . \ .... I t . . i I t l . i I . . . « 1 , . . , t . t | t| %| I . . . . I . | . > I 

ACAACGTGGAGCCAGCCAATGGCTTCCTC-G TTCG rTACCTGAGGAGGAAGCTGGTAGAGTCAGACAGCGA 6370 
CA TCAATGCCAACAAGGAAGAGCTGCrrCGGGTGCTCGACTGGGTACCCAAGCTGTGGTA T CATCTCCAC 8^0 

^C.CTTCC T TOAO A ^OCrt C OCT. -O^O/ i£ TT C CT CAT CO OOOOTTCO TT Cj r f« J'O rCQ^CTOCCAT TC. >»C 10 

GC A T T G AGG AC T TC C GG AC C TG G ^ r C A r f G AC C TG TG G A AC A A C T C T A TC A T f C C C FA fCTACAGGAAGo 6HH0 
A3CCAAGGATGGGATAAAGG rCCAl SGACAGAAAGCrGCTTGGGAGQACCCAG f GG A ATGG G TCCGGG AC 6660 
6660 6670 6630 6690 6700 6710 67?0 
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AC AC T ~r C C C TGG C C A TC AGC C C A AC A AG AC C A A f C A A AGC TG T AC C AC C T G C C C CC AC CC AC C G TG GC C C o!20 
U I CACAGCATTGCCTCACCTCCCGAGGATAGGACAGTCAAAGACAGCACCCCAAGTTCTCTGGAC TCAGA 67:5(5 
TCCTCTGATGGCCArSC I'GC fGAAAC T TC AAGAAGC I'GCC AACTACATTGAGTCTCCAGA !*CG AGAAACC 6860 
ATCCTGGACCCCAACCTTCAGfiCAACACT T rAAGGGTTCGGCAATCACTG FCACCCCCGGaCACCACAAC 6930 
GC I'GGCATCAGCTATCT TAGCTCCTCC T CTCCCCTCTCCTCTTTCAGAGCAC I GGC i'C TCCAGCCCCACV: 7000 

7010 7020 7030 7040 7050 /060 7C70 
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AGGAGAACAGGaGGGAGGAGGAGATGAAAGAGGAGGGACAGG r fC riSGTSCTCTACCTTTGAGAAC I l*C 70 7C 
C TAGGAACGAA FGGTGGGGTGGCGT I* TGGG AACTTGTGCCCCC TAAAC ACATTTAC TGGCC TCCTC T AA r / ' 40 
SACrrrGGGGAAAAGATGATTC "GGGTCTT TCCCTTGAC I i'C i' rG T TTCAATTAC^ AACTCC TGCGCT Tl' WO 
C rGGGCiAGGGGTTCAGAAAACA^CAAAAC AC TGCAGC AGTTCCCCGGAAT TCAGC" TGGAC 1*1 AACCAGG 72J3C 
CTGAACTTGCTCAAAAGAAGCCGAATTCCAGCAC:aCTGGCCTCCCCATGGTAT TG/: !*A I'C i GAGC fCCGC 7350 
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ArCQGCCGC rGTCATCAGATCGCC ATCTCGCGCCCGTGCCTC rGACTTC FAAG TCCAA I PAf rCTTCAAr 74?-i 

ATCCCTACATGCTCTrrcrCCCrGlGCrcCCACCCCCT/MTTTTGTTATTATCAAAAAAACTTrTTrrrA 7490 
A I n C I" r rG r I I I I l ACCTTCTTTTAAGTC ACCTCTAACAATGAAATTGTGTAGATTf AAAAA7AGAATT 

AA i 1 CG I'AA rAAAAAGTCGAAAAAAATTCTSCTCCCTCCCCCCATTAATAATAATTCTATrrrAAAATrT /B'K) 
ACACAATGT rCTGTGTACAC ITC I" I A IC I rTTTTTTACTTCTGATAAATTTTTTTTGAAACATCAFAGAA 7700 
7710 7720 7730 7/40 77(50 //&) ///o 

" " ' 11 1 1 1 " 1 1 1 t I I I I I ■■■ I , ... I .... I ... . I ' ■ • . . I ■ ■ ■ . | M t_.lt . I . I,..! J, | L 

AAAACCGCACACAAAA rACCTTATCATATGTTACG TTTCAGfT TATGACCGCAAT i T r r A f FT fVrfGCA 7770 
C'JTCTGGGCCTCTCATGACGTCAAATCATGCTCATCGTGAAAAAGTTTTGGAGTATTTTTGGAA'-'fTTr 7f>i r 
AAlCAAGTGAAAGTTTATGAAATTAATTTTCCTGCTTTTGCTTTTTGGGGSTTTCCrCTATTGTTTGTCA /<\k 

AAoAAGCiTTTuuGT : TbAfafaCTCAGTGGAAGGTGAG TAGAAG I rGAlAAT I" ICAAAo I GfiAS r AG K ! C r P.Of'f, 

hoho ao/o aoao ao90 sice 8 no 6 1 20 
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^I!?r G 5II TTTGCC7TMATGACAGAA ^^ nvjo 

.i.CGrACGGGCCCTTTCGTCTCGCGCGrTTCGGTGATGACGGTGAAAACCTCTriACArATGCAG'-T-r^G'; o |Qf . 
5 - A " 5 SJ[S A £ AGC rTG fC fG rAAGCMA I GCCGGGAGCAGACAAGCCCGTCAGGGC3CG7C AGCGGG YGX R-?GO 
TUiii.GGGTGTCGGGGCTGCiCTTAACTATSCGGCATCAGAGCAGATTGrAC fGAGAG rGCVT.A TATf CGG 83?0 
. '-T(jAAATACCGCACAGATGCG T AAGGAGAAAATACCGCA rCAGGCSGCC r FAAGGSCC FCG f'GAI'ACSC HUGO 
8«!0 6120 8430 8440 8450 8460 8470 
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'fT A ; ' IFAltStl I A £S?J I W TAATAATGGTTTCTTAGACGTCAGGTGGCAC T r r TCGGGG*AA FG 8470 

'T-5^ jA T^r^J A TIT^ TTTATTTTTCTAAATACATTCAM 0540 
•.„T"../.TAAATv.iCTTCAATAATATTGAAAAAGGAAGAG I'A fGAGTA~TGAACAT7rf rSTG T rGCfC T TATT 8t> 10 

^■;$IIJTIISCGGCATTTTGCCT-CCTCTTTTTGCTCACCCAGAAACGCT€GTGAAAGTAAAAGATS WHJO 
AA..,AiCAGTT, J uGTGCACGAGTGGGrTACA-CGAACTGGATC7CAACAGCCCTAAG'\rCCTTGAGA(irrr 8750 

0/60 0770 S 760 8790 8800 8010 8820 
1 ■ » I ■ i ■ ' i ■ ■ ■ ■ 1 t ■ ■ . I . i i ■ l i . i . , ■ . i . ■ . ■ i ■ . ■ . i , . , . i . , . . i , . . i . . . . i 

TCGCCCCGAAGAACG FT r I'CCAAi'GA rGAGCACTTTTAAAGTTCTGCTATGTuGCGIJGG , r A F I A ."CCG i 002C 
A| ! GAC.GCCtaGGCAAGAGCAAC t CSGTCCCCGCATACAC FA I' FC rCAGAATGAC T T3GTTG AG 7ACTCAC 6890 
-AG I CACAGAAAACCATCTTACGGATGCCA FGACAG l"AAGAGAATTATGCAG7GC7aCCA7 AACCATGAG 8960 
7GATAACACTCCGGCCAACTrACT7C7GACAACGATCGr,AGGACCGAAG6AGCrAA^CGr.7TT7TTGCAC 0030 
AACATGGGfifSATCATGTAAC TCGCC I" TGATCGTTGGCiAACCGG AGCTGAA TG AAGCCATACCAAACCACG 9100 

9110 9120 9I30 9140 G1G0 91G0 91/0 
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AGCGTGACACCACGA rGCC I'G FAGC AA7GGCAAC AAC'G X FGCGCAAACTATTAACTGCiCGAAC FAC F F AC 9170 

.0 .AGCTTCCCGCCAACAArTAAI'AGAC73GA7C;fiAGGC3GArAAAGI I G C AG G AC C AC TC TGCG C TCG Q_'i0 
CiCCCTTCCGGC7GGC7GG TrrATTGC~CA7AAATC rGCiAGCCGfiTGAGCGTGGGTC TCGCGG FA I'CA I TG 9310 
CAGCACTGGGGCCAG A TAG f"AAGCCC~CrCf:jTATCGTAG F FATCTACACGACGGGG AG r CAGGCAAC TM 9180 
(iGAFGAACGAAATAGACAGA FCSC FGAGAT AGGTSCCTCAC i GAF7AAGCA~7GGTAAC7G fCAGACCAA 9450 
9480 9470 9480 9490 9500 9010 <jb_0 
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G I' F FACTO ATATA7ACTTTAGA F I'G ATT7AAAACTTC A I Y ITrAATTTAAAAGGATC TAGS 7QAAGA \Cf >)W> 

MM I jA IAATCTCA7C.ACCAAAA rCCCTTAACGTCiAGT^7TCG~T^CAC-CAGCG TCAoA'JCCCa I AGA 3030 
AAAGA rCAAAG3ATCr7CT FGAGA I CCTTFTTTC 7GCGCGTAATCTGCTGCTTGCAAACAAAAAAACCA S660 
CCGCTACCACCGG TGG i TFGTTTGCCGCA rCAAGAGCTACCAACTCTTT T TCCGAACiG I'AAC I GGC ' !CA SP'.V! 
GCAGAGCGC AGA "AC CAA AT AC TG FCC I FC TAG T G TAG.C 3 TAG " • AGGCCACCAC f TCAAGAAC C FGT 0800 
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,VoCACCGCCTACAIAa:rCGCTCTGCrAATCCTGTTACCAGTGGCTGCTGCCAGTG3CGATAAGTC6TG T 3S7C 
C I I ACCGGG TTGGA r TCAAGACGATA6T I'ACCGGATAAGQCSCAGCfiGTCGGGCTG AACOGGCiUG r rCC I 
G-ACACAGCCCAGCrTGGAGCCAACGACCTACACCfiAACTGAGATACCTACAQCGT3AGCATT6AGAAAG 1C0i0 
CGC-f ACCCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGrAAGCGGCAGGGTCGoAACAGGAGAGCGC 1COHO 
ACGAGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTA TAG I'CCTGTCGGGTTTCGCCACCTCTGACTTG 10K;0 
10160 10170 10180 . 10190 10200 10210 10220 
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AGCGTCGAr rTTTGrGArfiCTCGTCASGGSaGCGGASCCTArGGAAAAACGCCAfiCAACOCGtSCC I I I I I WM) 

ArGG f ICC I'GGCCrTTTGCTGCCCTTTTGCTCACA rG I' TC ITICCTGCGTTATCCCCTGATTC7G76GAT 10230 

AArCGTATTACCGCCrnaAGIGAGCrGATACCGCTCGCCGCAGCCGAACGACCGASCGCAGCGAGTCAG WJW 

TilAGCGAGGAAGCGGAAGASCSCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATC lOO' 

CAGrrGGCACGACAGGTTTCCCGACTSGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCIC ICGCu 



10510 10620 i0&30 10540 10550 1C5SO loo/u 
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ACTCATTAGGCACCCCAGGC rTTACA'J i' 1' rArGCTTCCGGCTCGTATGTTGTGTGG AA T TG I'GAGCGGA f 10<o/0 
AACAA r riCACACIAGGAAACAGC I 1C59'4 
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$ACCAT6AnACGCCAAGCTTGCArGCCTGCA6GTCGACTc toqMATSTAA.\CCT6 rCATTTCTGTG JO 
rATTTCAGCA^CGCAGAGAGCACCATAAACAGCTACAACAAGAGCACGAACGrCGGCTCCAACAGAAAAA 1 10 

'^ AAAA £^f'^^ A ^TCACCACA -, "GC 1"CTGATAATTGCCATTTCC TTCGAT""TC I 6ACTTTCAA rTG 
I GATATcjGATAACCTAAAATCTGCCATTCAAAATGAAAACTCTTAAAGTT I AAGAGGC f rTCAATGArrc >UQ 
CGfJGAC ITTCCCACC I CATCG7T7T TCCCTTCTA rCTTATTcTAA r ITTTTGTA" [T7GTG rCAGTT I GC 350 

360 3/0 380 300 400 410 'l?0 
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.' A I^I^i^ TCACC rT r rT ™CCGT r r AATCTCTCTATAGTATTGTTCGGG rGGTTTCAAGATGAAACG '\20 
r i I ruTGTG r AGCA ITTTTGAAACACAGcGxGAAAAA fGCAGAAAC rATCATCC'iCAATGC A I' PAAGTG'- '190 

A ^ A I™I? A ™^ ™ 

AAATQGC T fTTTGTb f rtiGATCTTGATCTCGTCTC TTTCCTTCT I'CTTTCGTT T( TCTCGGAf CACTf'A T B"0 
TATTCGAATGCCATGCATTTG rTGATGATGCGCACGATGG 1"C G C A C AC AC T AC A/.C AG A TA TG A rCiGTGG 700 
'J° 720 730 740 750 760 770 
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C I GTGC.GTGGTGCAGAGCTCATCAATAAGrGATGGGGCACCGAGATG TGACTCC ICCCATTAI' rCGTCTT 770 
-t?X5«^^I?ZII rG-rc: TTGC3 TCCGTTGTCAG TCG AGCC TC A AGGC AGCC GTTCC G AGXG A TCGCO TG 340 
^.TAAGAAG : TGTAGGTGTAGAGGAAG^AACG I'CGGGTCCAAA rTTCAAGcqacatccgcqanat<"<i tnnn 9K> 
'JGGCCAAACGACCCAAACGTATGTTTCGAATGATACTAACATAACA fAGAACA F I FTC AG 900 
( J AC« J ACl.CTTGGctagaactag!:<jgatccgagccctcccatATGACGACGTCAAA TGTAGAA C TGATACC 1CG0 
1060 1070 1080 1090 1100 Hie 1 J20 
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AA J- ,A J' ' '^^ACTArCGACTGGTT-CTCAGC TTAT'AATG rGATCGTTCCGATCAACGAATTCTCGC 1 190 
r ih^IIr^™^ rG( 5CAAAAA r CACATCGAACCTGGATGGCCTCGAAACG TGTC rCGAC TACCT 1260 
^ AAAA ; C J^^I^CGACTCCTCGAAACTCACCAAAACCCAr A TCGACAGCGGAAAC ITGGGTGCAG IT 1030 
CT..vAGC TGC IC i TC'.TGCTC rCCACCTACAAGCAGAAGCTTCGGCAACTGAAAAAAGATC AGAACAAAT 1 'iOO 

, 1 V° 1420 |tJ 30 i J,i '0 1160 meo vm 
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i^^*^5 : ^~ rACCCACATCCAT r ATGCCACCCGCGGTTTC TAAAT TACCC TCCCC ACG TG^CijCC AC<3 TC •" /:) 

A 5rT A A^T^ A ^ AACrAACCCAAATTCCAACTrrCCACAAA ^ J5';0 

^ A $J^ AA 5 AA I A Z" AAAATrGATTCATCAAACiATT ' GG TATCAAGCCAAAGACG I ITGGAC ITAAACCAC 1610 

\liy£F^ C r C ^ 1680 
I AATGTTGuCTCGAC.GATATCCACA i CTGCGAAGAGCTTAGAATCATCATCAACG TACAGC rCTATTTCG 1 7FC 

1 780 1770 17fi0 17U0 1800 1810 WX> 

t ' ' " ' ' " 1 " , | i 
?^rK?Ji'^A$^^* J^^*C- C r, -= AAA A A CCTTCTAGACCACAAACCCAGCrAG ITCGTGi TGCTA 1820 
rilr IS a^JI^ GAAGCTCAAaGC TAGCCGCTCCGAAAGCCGTGAGCACCCCAAA AC ITGCTTC TGT 1 000 
^^AvTATT^GAGCAAAACAACAGCCCSA TAACAGCGGTGGTGGTGGTGGTGGAA TGCTGAAATTAAA6 960 
r T< IlrAArAAAr r-^TCTrrA ! ^ T I~'^ ^ A TCGAATAGCCCACAACCT ACCAGAAACGCGGCGGCGGr&C 20§0 
CTLAACAACAAACrTTGTCGAAAATCGC7GCCCCAGrGAAAAGrGGCCTGAAGCCtiCCGACCAGTAA«CT V1C0 

?110 2120 2130 2H0 2150 2160 ?170 
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'^ATrrl a a^a^I^; A J? TCGAAGCTTT<3 TACGCCAAAAG rTTCCTACCGTAAArtCGGACGCCCCAA ic" ? I /O 

^■>-£lcr^^ 2310 
v-CGAl AGCCGl. AACACCGG rTTCTCCAAATAT rATCAACAAGCC 'GTTGAGGAAA;.ACCAACACTGGCAG ^'JO 
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TG AAA6GAGTGAAAAGCACAGCGAAAAAAGATCCACC fCCAGCTG TTCCGCCACG TGACACCCAGCCAAC 2G2C 
AATCGGAGTTGTTAGTCCAATrArSGCACATAAGAAGTTGACAAATGACCCCGTCATATCTGAAAAACCA 26Q0 
GAACCTGAAAAGCTCCAATCAATGAGCATCGACACGACGGACGTTCr.ACCfiC f rC::ACC FC FAAAATC AG ?6fiO 

; ir, r cccac r faaaa igac r fcaatccgac aaccaccaacgtacg atgttcttctaaaacaaggaaaaat v no 

CACA I'CGCC I'G rCAAGTCGTTTGGATATGAGCAGTCCTCCCCGTCTGAAGACTCCATTGTGGCTCATGCC 2800 
2810 2820 2830 2840 2850 2860 28/0 
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TC GGCTC AG GTG AC TCC CCCGAC A A AAA0TTCTGGTAATCATTCGC7GG AG AG AAGG A FGGGAAAGAATA ?£70 
AGACATCAGAATCCAGCGGCTACACCTCTGACGCCGGTGr TGCGA TG TGCGCC AAAA I'GAGGGAGAAGC r ?M0 
KA AAGAATACGATGACATGAC FCGTCG AGC AC AGA ACGGC TATCC TGACAACTTCliAAGAC ACiTTCC FCC 3010 
TT G T C G T C T GG AA 1' A fC C G A T A AC A AC G AG C T C G ACG A C A T ATCC AC G G A C G A T TT G T CC G G ACT A CACA .'JGMO 
;"GGCAAf!AGTCCCCTCCAAACATAGCGAC TAT fCCCACTTTGTTCGCCATCCC ACG TCTTC T FCC I'CAAA 3150 
31B0 31/0 3180 3190 3?00 3210 2??0 
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GCCCCGA6TCCCCAGTCGGTCCTCCACATCAGTC6ATTCTCGATCTCGAGCAGAACAGGAGAATGTGTAC 32 W 

a a r rc rn fc c c a g 7G c c g a ac g a c : i c c a a c G rGGCGCCGCTGCCACC fcaacc r fcggacaaca r rcnc ro$o 

FA AG A FCCCCGGGA I AC TCA ICC TA r FC FCCACAC I' I A FC AGTGTCAGC FGAF AAGGACAC AA ! G I C I A I 3360 
GC AC TC AC AG AC TAGTC GACGACCTTC7TC AC AAA AACXAAGCTATTCAGGCCAA^TTCATTC ACT TG A F 3430 
C'G T A A AT G C C AC C T rCAAGAGTTCACATCCACCGAGCACAGAATGGCGGC TCTCT"GAGCCCGAGACGGG 3SO0 

3G10 3620 3G30 3G40 3GG0 36B0 'Jii/Q 
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TOCCGAACTCGATG rCGAAATATG AT FC TTCAGGATCCTACTCGGCGCGTTCCCGAGG FGG AAGC i'C I AC 3570 
rGGrATCTATGCAGAGACGTTCCAACTGCACAGACTATCCGATGAAAAATCCCCCf.CACATTCTGCCAAA 3*40 
AGTGAGATGGCiATCCCAACTATCACTGGCTAGCACGACAGCATATeGATCTCTCAATGAGAAGTACGAAC 3710 
AT6CTATTCGGGACATGGCACGTGACTTGGAGTGTTACAAGAACACTGTCGACTCACTAACCAAGAAACA 3780 

ggagaactatggagca r i g r r i"G a rc r r r r tgagcaaaagct tagaaaac fcac rc aacaca ftgatcca w>o 

3860 3870 3880 3890 3900 3910 3<J20 
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rCCAACTTGAAGCCTGAAGAGGCAATACGATTCAGGCAGGACATTGCTCATTTGAnGGATATTAGCAAI'C 39?0 

ATCTrGCATCCAACTCAGCTCATGCTAACCAAGGCGCTCGTGAGCTTCTTCGTCAACCATCTCTGGAATC 3GG0 

AGTTGCATCCCA rCGA! CA FCGATG i'C A I CG I CG FCGAAAAGCAGCAAGCAGGAGAAGATCAGCTTGAGC 40(50 

TCGTTTCGCAAGAACAAGAAGAGCTGGATCCGCTCCTCACTCTCCAAGTTCACCAaGAAGAAGaACAAGA 4130 

ACTACGACGAAGCACATATGGCArCAATT PCCfiGA TCTC A AGGAAC FC F TGACAAC A r TGATGTGATTGA <*200 

4? 10 4220 4230 4? 40 4250 4260 4270 
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G T T G A A 0 C A AG AG C T C £ A AG A ACGC OA FAG FGCAC lTTACGAA.TrcCGCCTTGAi:«ATCTGGATCGTGCC 42/0 
C(v::GAAGTTGATGTTCTGAGGGAGACAGTGAACAAGTTGAAAACCGAGAACAAGCnAT F AA ACt AAA G A AG '1340 
T GGACAAACTCACCAACGGTCCAGCCAC rCGFGCT I'C r rC CCGCGCCTC A A TTCC>\ G TTATC T ACG ACGA 4410 
!GAGCA rGTCTATGATGCAGCGTGTAGCAGTACATCAGCTAGTCAATCTTCGAAACGATCC FC I'GGC ! GC 4'U°0 
AAC rCAATCAAGCiTTACTGTAAACGTGG/CATCGCTGGAGAAATC AGTTCGArCG 'TAACCCGGAC AAAC 4b / ?0 

4680 U6/0 4680 4590 4600 '1610 4620 
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AG AT A AT CO T AG G A FA If . rGCCATG r C A AC C AG T C AG T C A TG C TG G A A A G AC A T " G A TG T T T C FA F i'C F '1620 
AGGACTATT7GAAG rCTACC i*A FCC Ao AA " rGATG rSGAGCATCAACTTCGAATO;; ATGCTCCJ FGA I I'C i 4SQ0 
ATCCTTGGC FA I'CAAA rTGGTGAACTTCGACGCGTCATTGGA6ACTCCACAACCA"GA FAAC^AGCC A FC 47G0 
CA ACTGACATTC T FAC F FCC FC AACTACAA TCCGAATGTTCATGCACGGTGCCGC aCAGAG TCGCO FACA -18X 
CAGTCTGGTCCT FGA 'ATGCTTCTTCC AAAGCAAATGATTCTCCAACTCGTCAAG ". CAA F 1 F I'GAC AG AG 4<30t; 
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AGACGTCTGGTG77AGC7GG AGCAACT6GAA FTGGAAAGAGCAAAC FGGCGAAGAC CC TGGC TGC TTATG 'W70 
I'A I'C FA f TCGAACAAA rCAA7CCGAAGA7AG7ATTG7TAATATCAGCATTCCTGA/i AACAATAAAGAAGA fSOUO 
ATTGCTTCAAGTGGAACGACGCCTGGAAAAGATCTTGAGAAGCAAAGAATCACGC/ TCGTAATTCTAGAT 5110 
AATATCCCAAAGAATCGAATTGCATTTGTTG TATCCGf TT (TGCAAATG FCCCAC f PC AAA AC AACG A AG 5130 
G rcc: A TTTGTAG TA FGCACAG FCAACCGAT A TCAAATCCC TGAGC TTC AAATTCAC CACAA F I* I CAAAA r 5?50 

5260 5270 5280 5290 5300 5310 5320 
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GTCAGTAATGTCGAATCGTC TCGAAGGATTCATCCTACGT 7ACCTCCGACGACGGG CGGTAGAGGATGAG h'SAC 
TATCGTCTAACTGTACAGATGCCATCAGAGCTCTTCAAAATCATTGACTTCTTCCC AATAGCTCTTCAGG 5390 
CCG TCAATAAI" 1* I I A I" IGAGAAAACGAATTCTGTTGA7G7GACAG7TGGTCCAAGA GCA FGC F FGAAC FG BURC 
TCCTCTAACTGTCGATGGATCCCGTGAATGGTTCATTCGATrGTGGAATGAGAACTTCATTCCAI'A I I I G 5530 
GAACGTGTTCCTAGAGATGGCAAAAAAACCTTCGGTCGCTGCACTTCCTTCGA6GATCCCACCGACATCG 6600 

5610 5670 5630 5640 5650 5660 5670 
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TCTCTAAAAAATG6CCGTGGTTCGATGGTGAAAACCCGGAGAATGTGCTCAAACG7CTTCAACTCCAAGA fjft/Q 
CC ICG I'CCCGrCACCTGCCAACTCATCCCGACAACAC r TC AA rCCCCTCGAGTCGT TG ATCCAATTGCAT fwUf) 
GC TACCAAGCA fCAGACCA FCGACA ACA F F I'GAAC AGAAGAC fC FAACC I' fC FCTCGCC rCTCCCCCGCT 5810 
TTCCTTATCTTCGTACCCGTACCATGGTATTGATATCTGAGCTCCGCATCGGCCGCTGTCATCAGATCGC 5f)00 
OA PC rCGCGCCCGTGCCTCTGACTTCTAAGTCCAATTACTOTTCAACATCCCTACATGCTCTTTCTCCCT f^m 

£960 5970 5980 5990 6000 6010 6020 
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GTGCTCCCACCCCCI'ATTTTTGTTATTATCAAAAAAACTTCTTCTTAATTTCTTTGTTTTTTAGCTTCTT t>0XO 
TT AAGTCACCTC7AACAA7G AAA7TGTG7AGA7TC AAAAATAGAA7TAATTCGTAA TA AAAAG TCGAAAA 60Q0 
MArrGTGCTCCCTCCCaXArTAATAArAArrcl'ArCCCMAATCTACACAATGTTCTGTGTACACTTC 6160 

r ;"a re ; r rr rr r rAcrrc fgataaattttttttgaaacatcatagaaaaaaccgcacacaaaa facct fa 6230 

TC A 7 ATGTTACGTTTCAGTT^ATGACCGCAATTTTTAT TTC r FCGCACG I'C I'GGGCC l"C PC A I'GACG I OA 6300 
6310 6320 6330 6340 6350 6360 6370 
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AATC ATGCTCATCC7GAAAAAGTT f'TGGAG FAT7TT7GGAA7T777CAA7CAA6TC; A A AG I IT A I'GAAA T 63 /<) 
TAATTTTCC TGCTTTTCCTTTTTGGGGGTTTCCCCTA7TGTT TGTCAAGAGTTTCC AGGACGGCG7T77T 64UC 
CTTGCTAAAATCACAAGTAT FGAFGAGCACGATGCAAGAAAGATCGGAAGAAGG77 TGGGT i TGAGGC FC 6510 
AGTGGAAGGTGAG FAGAAGTTGA7AATTTGAAAG FGGA37AG7GTC7A7GGGGT77 TTGCC ITAAATGAC 6580 
AGAATACATTCCCAATATACCAAACATAACTGTTTCCTAC FAG FCGGCCG7ACGGG CCC777CG7CTCGC 6650 
6660 6670 6680 6690 6700 6710 B/70 
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GC G T r rCGG I G A T G A CG G TG A A A AC C T C T G AC AC A TG C AG C TC C C GG AG A C G G T C / C A G C I" I'G I'C FGVAA B/'A: 
GC G G A TG CC CiGG AG C AG AC A AG CC C G TC AC GG C G C G TC AG CGGG T GTTGG CGG G TC rCCCCCCTCGCTTA 6/90 
AC TATGCGGCATCAGAGCAG ATTGTAC TGAGAGTGCACCATATGCGGTGTGAAAT/ CCGCACAGA FGCG !' 606(J 
AAGGAGAAAATACCGCA rCAGGCGGCCTTAAGGGCCTCGTGATACGCCTA f F7T7/ TAGGTTAATGTCA T 6930 
GATAATAATGGTTTC F7AGAGGTCACG FGGCAC77TTCGGGGAAA7G7GCGCGGAACCCC I'A I" rrGTTTA 700', 

7010 7020 7030 7040 /ObO 7060 7070 
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TTT7TCTAAATACATTCAAATATG PA TCCGCTCATGAGACAATAACCCTGATAAAI GC ITCAATAA FAT 7 7070 
GAAAAAGGAAGAGTATGAGT A F FCAACAT7T0CG i'G FCGCCCT7ATTCCCTTT T [ \ GCGGCA I I I IGCCf 7rt0 
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CGRrCGCCGCArACACrAfTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACG /420 

GATGGCATGACAGTAAGAGAATTA rGCAG I'GCTGCCATAACXATG AGTGATAACAC TGCGGCCAAC V I'AC 7*100 

TTCTGACAACGATCGGAGGACCGAAGGAGCTAACCGCTTTTTT5CACAACArGGGGGA7CATGTAACTCG /660 

CCTTGATCGTTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGC.GTGAC-fiCCACGA TGCC TGTA 7630 

GCAArGGCAACAACGTTGCGCAAACTATTAACTSGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAA 7 700 

7710 7720 7730 77'fO 7750 7760 7770 
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TAGACTGGATGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTCGCTGGTTTAT 1 1 10 
TGCTGATAAATCTGGAGCCGwITGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGCGGCCAGATGGTAAG /ls*K> 
CCCTCCCGTATCGTAGTTATCT ACACG ACGGGGAGTCAGGCAACTATGGATGAACC A A AT A G A C AG A TC G 73 10 
C rGAGATAGGTGCCTCACTG ATTAAGCATTGGTAAC Ttz TCAGACCAAG T t TAC'TCfi !*A FA PAC f r rAG A T 7980 
jr. attta MACTTCATTTTTA A TT7A AAAGGATCTAGGTGAAGATCCTTTTTGAT^ ATCTC ATCACCAAA #0^0 

0060 0070 OOOO 8090 81 CO 0110 0120 
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AI'CCC I I'AACG I' GAG IT !" ICG f KCCAC I'GAGCGTC AGACCCCGTAGAAAAGATCAA AGGA I'C I \ K i \ GAG 8120 
ATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACC/»GCGGTGGTTTGTTT 8 I SO 
GCCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACT »'2m 
GTCCTTCTAGTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGC ACCCrCTACATACf. TCCCTC H'A'JiO 
TGCTAATCCTGTTACCAGTGGCTGCTGCCAG TGGCGA TAAG I CG FGTC T TACCGGC T^GGAC fCAAGACG 8*100 

8410 8420 8'130 8'PIO 8'<50 8'I60 0'I70 
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A:"AGTTACCGGATAAGGCGCAGCGGTCGGGCTGAAC6GGGGGTTCGTGCACACAGCCCAGCTTGGAGCGA 8M70 
ACGACCTACACCGAACTGAGATACC I'ACAGCGTGAGCATTGAGAAAGCGCCACGCT TCCCGAAGGGAGAA 85'I0 
AGGCGGACAGGTATCCGGrAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGA<:<: f !"CC AGGGGGAAA 8610 
CGCCTGGTATCTTTA ! AG I CC rGTCGGGTTTCGCCACCTCTGAC T TGAGCG I CGA f f rTTGTGATCCTCG S680 
TCAGGGGGGCGGAGCCTATGGAAAAACGCCAnGAACGCGGCCTTTTTACGGTTCCTGGCCTTTTCCTGGC 8750 

8760 8770 070O 8/90 8800 Utf10 8820 
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crrTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGSATAACCGTATIACCGCCTTTGAGi'G 8620 
AGCTCATACCGC I'CGCCGCAGCCGAACGACCGAGCGCAGCCAGTC AGTGAGCGAGG AAGCGGAA3AGCGC 8890 
CCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGArrCAlTAATGCAGCTGGWCGACAGGTTTCCCG 89W> 
AC fGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTCACTCATTAC GCACCCCAwCiC i I ! 5030 
AC AC riTAI GCTTCCGGCTCGTATG TTGTGTGGAAT TG I'GAGCGGATAACAaNTTTC ACACACGAAACACIi: 0100 
9110 9120 9130 91*10 9150 S180 3»/0 
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AAfiCTTGCATCCCTCCACGTCGACTCTAGAGGArCCCCGCJGAr rGGCCAAAGGACC CAAACgt atq 1 1 tc 70 
gaotgatactaacatoucaUigaacattttcaaGAGGACCCTTGGAGGGTACCGG'l AG AAAAAATG AG TA 140 
AAGGAGAAGAACT rrrCACrGGAGrTGTCCCAATTCTTGTTGAATTAGATGCTGAf Gl* rAATGGGCACAA 210 
ATTTTCTGTCAGTGGAGAGGG l'GAAGG IGA t'GCAACA PACGGAAAACTTACCCTT* AATTTA I* I I GCACT 280 
AC rCGAAAACTACCTGTTCCATGGgtaagt t taaacatat atatac tone, fcaaccc tgattcit ttaaot h 3G0 

□bo 3/o :m 390 400 410 420 
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ft cagCC AACAC f l"GTC ACTACTTTCTgTTATGGTGTTCAATGCTTcTCgAGA VAC. CCAGATCATATCAA 420 
ACgGCATGACTTTTTCAAGAGTGCCATGCCCGAAGGTTA TGVACAGGAAAGAACT^ TATTTTTCAAAGA I* 4<30 
GACGGGAACTACAAGACACgtaagt ttaaacagt tcggtactaac taaccutacatat t taaattt trag 5B0 
GTGCTGAAGTCAAGTTTGAAGGTGA I'ACCC '* l"G I' TAA f AGAATCG AGTTAAAAGGT ATTGATTTTAAA^A 630 
AGATGGAAACA FTCTTGGAC ACAAATTGGAATACAAC7ATAACTC ACACAATG TA f ACA FCA rCGCAGAC /GO 

710 720 730 740 750 760 / /0 
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AAACAAAAGAATGGAATCAAAGTTg taacs 1 1 taaac t tggac t tac taac; taai;gc;ctt tatat ttooatt 770 
t+.cogAAO r ICAAAA F TAGACACAACA TTo AAGATGGAAGCG T TC AAC TAGCAGACC ATTATC AACAAAA K40 
TACTCCAA I IGGCGA I'GGCCC I G I CC I'T f FACCAGACAACCATTACCIG TCCACAC AA IC ■ GCCC i I I CG 910 
AAAGATCCCAACGAAAAGAGAGACCACATGGTCCTTCTTGAGTTTGTAACAGCTGCTGGGATTACACATG DUO 
GCATGGATGAACTATAC AAATAGCA T TCGTAGAATTCCAACTGAGCGCCGGTCGCT ACCATTACCAACTT 1CfX", 

1060 1070 1080 1090 1100 1110 1120 
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G I'C I'GG I'G rCAAAAATAATAGGGGCCGCTGTCATCAGAgt acgi t toaac fcgagt fro tcir:bc:c:s<: baacaa 1 120 

gtuaUil ttaaat tt t cagCA I'CTCGCGCCCGTGCCTCTG ACT TC 1'AAGTCCAATT ACTCTTCAACATCC I 190 

CTACATGCTCTTTCrcCCTGTGCTCCCACCCCCTATTTTTGTTATTATCAAAAAAACTTCTTCTTAATTT 1260 

CI rrGTTTTTTAGCTTCTTTTAAGrCACCTCTAACAATGAAArTGTGTAGATTCAAAAArAGAAITAATT I330 

CGTAA I'AAAAAGTCGAAAAAAATTG FSCTCCTCCCCCCA TTAATAATAATTCTA f CCCAAAATC.TAC AC 1400 

1410 1M20 1430 1440 1450 14fi0 1470 
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AATGTTCTG i'G T ACAC~TCTTATGTT!TTT T I'ACTTCTGATAAATTTTTTTTGAA/: CA i'CATAGAAAAAA 14 A) 
CC GC ACA C A AAA T AC C T TA t'CATATGTTACGTTTC AGTTTATGACCGCAATTTTTATT PC T TCGCACG i* C 15"0 
TGGGCCTC rCATGACCi T CAAATCATGC T CATCGTGAAAAAGTTTTGGAGTATTTTTGGAATTTTTCAA PC 1610 
AAGTGAAAGTTT A I'GAAATTAA T I fTCCTGCT I FTGCTTTT I'GGGGGTTTCCCCTATTGTTTGTCAAGAG IbrtO 
i T rcGAGGACGGCCTTTTTCTTGCTAAAATCACAAGTATTGArGACCACGArGCAAGAAAGATCGGAAGA 1 /bO 

1760 17^0 1700 1/90 1AO0 1U10 I3?0 
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AGGTT iGGGTTTGAGGCTC AGTGGAAGGTG AGTAGAAGTTGATAA TTTGAAAG ! GG AGTAG TG ?'C I A I GIJ 1A20 
GG r TTTTGCC 1 fAAATGACACAATACATTCCCAATATACCAAACATAACTGTTTCC fACJAGTCGGCCCT i<W0 
ACGGGCCCTT fCGTC fCGCGCG TTTCGGTG ATGACGGTGAAAACCTCTGACAC ATGCAGC^ IwO 
GGTCACAGC T fGTCTG ' AAGCGGATGCCGCGAGCAGAC AAGCCCG rCAGGGCGCo CAbCQGGTuI Fo GC 20J0 
GGGTGTCGGGGC rGGCTTAACT ATGCGGC A FCAGAGCAGA F I'GTACTGAGAGTGC.ACCATA FOCGGTV, I (, 2 100 
2110 2120 5!30 2140 21W 2160 ".'WO 
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AAATACCGCACAGA 
I ITTATAGGTTAA 
CCUiAACCCCTAT I 



TAAATGCT TCAATAA TATTGAAAAAGGAAGAGTATGAC ' . P * T"^~ x ( rr*T'- a\HA ->n*0 

TTTTTGCGGCATT I* TCCC i I'CCTr, I* -TTTGCTCACCCAGAAACGCTGG TGAAAG r \AAAuA i GCTuAAGA > 
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TCAGTTGGG I'GCACGAGTGGG TTACATCGAAC rGGATCTC AACAGCGGTAAGA l'C( . TrGAGAGTTTTCGC 2620 

CCCGAAGAACGT T I* rCCAATGATGAGCACTTTTAAAGTTCTGCTATGTGGCGCGG "AV TATCCCGTATTG 2600 

ACGCCGGGC AAGAGCAAC I'CGGTCGCCGCATACAC TATTCTCAGAATGAC r fGGT r GAGTACTCACC AGT 2660 

CACAGAAAAGCATCTTACGGATGGCAI'GACAGTAAGAGAAri'ArGCAGTGCTGCCATAACCATGAG FGA 1' 2730 

AACACTGCG6CCAACTTAC T I'C TGACAACGATCGGAGGACCGAAGGAGCTAACCGC F T T I" I'TGCAC AACA 2flOO 

2810 2820 2830 2840 2850 2860 28/0 
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iGGGGGATCATGTAACTCGCCI' TGATCGTTGGGAACCGGAGCTGAATGAAGCCA TACCAAACGACGAGCG 2870 
f'G AC ACC ACGATGCCTG7AGCAATGGC AACAACGT T GCGC AAACTATTAAC TGGCUAACTAC F TAC FC TA 2940 
6CTTCCCGGCAACAA I f AATAGACTGGATGGAGGCGGATAAAGTTGCAGGACCAC"TCTGCGCTCGGCCC 3010 
I* rCCGGCTGGCTGGTTTATTGC IGA I AAATCTGGAGCCGGTGAGCG rCGGTCTCGCGGTATCAT TGCAGC OOftO 
ACTGGGCiCCAGATGGTAAGCCrTCCCGTATCGTAGTTATCTACACGACGGGGAGrCAGGCAACTATGGAT 31*0 

O 4 /T>/S «-> 4 ~»/-\ ~i 4 Of\ A <*NA '»»I«M I 
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GAAC6AAAI AGACAGATCGCTGAGATAGG I'GCCTCACTGATTAAGCATTGG l'AAC**GTCAG ACCAAGTTT 3220 
ACrCATATATACTTTAGATTGArrrAAAACTTCATTTTTAAFTTAAAAGGATCTAGGTGAAGArCC ITTT 3230 
TGATAATCTCATGACCAAAArcCCTTAACGTGAGTTTTCGrrCCACTGAGCGTCACiACCCCGI'AtiAAAAG 3360 
ATCAAAGGATCTTC T IXiAGATCCTTTTTTTC rGCGCGTAATCTGCTGC '< I'GCAAAC AAAAAAACCACCGC 3430 
TACC AGCGG TGG f I' I'G l* TTGCCGG ATC AAGAGC I'ACC AACTCTTT TTCCGAAGGTaAC TGGCTTCAGC AG 3500 

JO 10 3520 3530 3640 35«) IJHBO 3570 
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AGCGCAGATACCAAATA'J I G ITXTTC7 AGTGTAGCCGTAGTTAGGCCACC AC N'CaAG AACTCTGTAGCA Jt>/0 
CCGCCTACATACCTCGC rc I GC7AA7CC7G77ACCAGTGGC7GC7GCC AGTGGCSaTAAGTCGTGTC TTA 3640 
CCGGGTTGQACTCAAGACGA^AGTTACCGGATAAGGCGCACCSGTCGGGCTGAACCJGGGGG r fCSTGCAC 3/10 
ACAGCCCAGCTTGGACCGAACGACC"ACACCGAACTGAGATACCTACAGCGTGAG(;ATTGAGAAAfiCGCC 37Q0 
ACGC rrCCCSAAGGGAGAAAoGCGGACACGTATCCGGI'AAGCGGCAGGGTCGGAACAGGAGAGCGCACGA 3050 

3860 3670 388C 3890 3900 !W10 3920 
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GGGAGCI fCCAGGGGGAAACGCCTGGTATC r TTATAGTCCTG rCGGGTTTCGCCACCTC iG AC I TGAGCG 3920 
rr:GAI iTTTGTGATGCrCG IXAGGGGGGCGSAGCCTATGGAAAAACGCCAGCAACGCGGCCTTT77ACGG 3990 
TTCCTGGCC r rr fGCTGGCC. TTTTGCTCAC ATGTTC r I'TCCTGCGTTA TCCCC TGA77C7G7GGA7AACC 4060 
GTATTACCCCC \' TTGACTGAGCTGA IACCGC7CCCCGC AGCCGAACGACCGAGCGOAGCGAGTCAGTG AG ^ 1 30 
CGAGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCCGArTCATTAATGCAGC 42C0 

4210 42L'0 4230 4240 4260 4260 42/0 
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TuGCACGACAGGTTTCCCGAC [GGAAOGCftflGCAG TGAGCGCAACGCAATTAATOTGAGT f AGC I'C AC It '11170 
AT TAGGCACCCCAGGCT I" I'ACACTTTATGC T PCCGGCTCGTA Tfi I* IG ■■'G7G6AAT TGTGAGCGGA I'AACA 4340 
A-TTCACACAGGAAACAGCTATGACCArGATTACGCCAAGCTqkaaqt tta«acacgak<: t tactciacta 44 IC 
qc taUc tcatx tciaot 1 1 tcagAGCTTAAAAAl GGCTGAAATCACTCAC AACGATGGA FACGC FA AC A A 4480 
C T TG G A A A fG A A A f 4494 
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CiGGCGATGQCCCACTACGrGAACCAl'CACCCrAATCAAGTTTTTTGSGGTCGAGGTSCCriTAAAarACTA 7K0 

aatcggaaccctaaaggga6cccccgatttagagcttgacggggaaagccggcgaa:gtggcoagaaagg nr,o 

W : 3 /O 'JHO JUG 400 410 420 
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AAGG6AAGAAAGCGAAAGGA6CGGGCGCTAGGGC6CTGGCAAGTGTAGCGG TCACG ^ I'GCGCG TAACC AC 'i20 
CACACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGTCCCA r TCGCXA r IT.AGGC rGCGCAAC 1G I iT,G «W0 
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CTGCAGGAATTCGATArCAAGCTTATCGATACCGTCGACCTCGAGGATCAGAAGAAATTGGAGCAACTAC / /O 
CCACATCCATTATGCCACCCGCGGr I' IC TAAGTGAGTTTAAr T r rGAGTTTACGACTACAAAAATGTG r \ 340 
CTTTAATAACTATCriCGACTTGAGTCTATTCTGTATCACTAGTTGrrGAGTGATTTTTCATTGAGAAAA 010 
TATrAAAAGGAACATTATTTACTTTGCrTATTTGCCCTAACrTTGATTTAGTTTTTCGArCAACTAGATC QflC 
rTACAAAACTTGCAATACAATTCCAlTTTCAGATTACCCTCGCCACGTGTCGCCACSTCAGCAACCGCTT 1060 

1060 1070 I0KO 1000 1100 1110 1120 
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CAGC AACTAACCCAAATTCC AACTTTCCACAAATGTCAACATCCAGGCTTCAGACTCCACAGTCAAGAA f ! 1?0 
AiCtiAAAAl rGGTAAGAATTTrA»TTTGAGCTCAAACTTGTATAAAATGCCCAaAAAAGAAGATGA I AAA 1 ISO 
AATGTAGTTT r f I'TGCAAAACTTCC ACCTTTATTGC fCTAATATGACGGC TTATATCTCAA r TTTCTTGA 1260 
GTTTTATCAAAAAATTTTCCACTA I* AC AAA TG TAG AAA AG TAT TT TGC AC AAA TT T TG rCAGTTGACASC 1300 
TTTGTAA7AGATCCAAATGGAACCTAGATACAAGCTG I' rAAAGTGGAAGGAGCGCAAGTCTATACTGGAA 1 -'iCO 
V1 10 1 420 1'!30 1 440 1'I60 1460 1M70 
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ATAA I GATCTGAAACAAATTTGTGCTATTC TCAAATG TTT AAGAC ATGTTTTGAAG ATTTT 1' i'CAAATTC 1470 
GC AC TAG TTTCAG AACC TTC CT r r TTGTATGAAAAAG TAAAAAAAAACTATTTCAAACCC I'CACCGCCAC :640 
CATGTT IX'AACTCTTAATTTTTATAAAATTTTGCAATTTACAAATCGCC TCCCCT'f GCCCGAAAAG TGCC 16:0 
CACCAAAA rCAATTTCTCGGC T fCATAATGACTTTTAAA fTGATGTGAGAAAACAC AGAAGAGGC I'AAC T 1630 
AAAI'TGACAGGGACAGGTTGTrCCTCTTCrCCCTCCTTCTCCCGCCTCCTCCTCCCC I ICCATCTCCAAC I750 

1/t>0 1770 1780 1790 1U0U 1010 1ti^0 
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AACAACAATTTTCCAAI fTCGT IGTCCA i'TTTGC PT ATAA AC A 1 I FGTGTGTGGAAGGAAAC I'ACACGGG I32C 
GAGACGGTCAATTAATrCGAATSAGAGCATaGCAATTACTCrTTCGGAAATTGATCAAI-AAAGATAGAGC 11M0 
CGATGACACTGGCTGG IAGTAG TATGAGTGTAGAA i'TGCT T ITTCATCGTCTCAACTTGCGCATCAGTCT I960 

rCCCCCGCTCTCATCACTGACAATTAATGTCGGGTTTTATGCGCTCTTTCCrATTCCGCCACrCATTCTG 2030 
GGT f ACCACAAACTGGAATACATTTTACTACTATTCAAGCCATT rATTTTGATAT"TAA i'TTTGTGCAA I' 2100 
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ACATTCAGCCAG r rCGTACAATTTTCCATGCTTTTTGGCCCATTAAAAAAC ITTCTCACCTCTTCA ICCA -5G?0 
TC TC ACTCG TA TCA I'AAAAAG I'A I'AGCAAAAGCCCGAC TC TAC TTTTTAA6AG AAGGAGATAC TGAGCT A 26QO 
OA IGGCG FG TGACCCTTTTCATCTCGTCXGTTCGGTCTCAAATTCACGC FCATACT AACTC TTf AAA TAG 26*30 
CCATAGACC TCCTTGTTTTCTTCTTCGTTTTGACTCGCGCCTATTTTTTGrGGCTGCCTGAAAGrCGGGA 2730 
AAA rfTAGTATATTTATGAGCTTATCTTTATGCAATACATAAAAAACGAGGCAATTTAAAAATA I I'AAAA 2800 

2010 ?8J>0 200O 2040 20G0 2800 26/0 
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iTAATGAGGrTCTAGATGTAGATTTGGAAAAGAAGAAAAAAACAAAACAAATAGGAACCGCCAGATCAAA 2070 
ATTCTATTTAAAGCi F I TTCAAGATGTTTAGGC AAGATTCGGCTGAACAGAAAACTGAA6TGCC TOCATAA 2040 
AT C T AGTGT AAC G T T7AG ATTG AAC TC GG A AA TCC TAA GC C rGAACTATAGCCTTATTC TAGA IC I" TAGT 30 10 

tgcgcataagctcaagcc(:aagc;agaaatgacttgcatttagtttaagcc IAGATTGACTTGCTTGC I IC 3080 

AG TCTAATCCAGACTAGATTTCCAAGAGAG TTTTCAATTTTAAATGTTTC.CAG I' r IX" F TGTTACTTAAAA 3)50 

wivx/ ^i/v \j i ou oxuw J^'IU J'Z'Jij 
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TCI lAArGCCCTGTGATGCGTAAAATCGTTATCCC rrrcrCTCACACTTTCAATTACAGATTCATCAAAC H220 
ATTGGTATCAAGCCAAAGACGTCTGGACTTAAACCACCCTCATCArCAACCACTTCATCAAATAAIACAA 3?9Ti 
ATTCATTCCGTCCGrCCiAGCCGTrCGAGTGGCAATAATAATGTTGGCrCGACGATArCCACATCTGCnAAX 3^60 
GASCTTAGGTATCCGA ICC T rcCGGCTTCTTTTTAGAAATTATATTAI* I* f CAGAATCATCATCAACG I' AC 3430 
AGCTCTATTrCGAATCTAAACCGACCTACC^CCCAACTCCAAAAACCTTCTAGACCACAAACCCAGCrAG 3600 

3fi!0 3620 3530 3640 3SbO 36G0 3<i/Q 
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TTCGTGrTSCTACAACTACAAAAATCCiGAAGC rCAAAGCTAGCCGCTCCGAAAGCCS TGAGCACCCf AAA :$^/0 
A^TTGCTTCTGrGAAGACTATTGGAGCAAAACAAGAfiCCCGATAACAGCGGTGGTGSTGGTSGTGfiAAr^ 3600 
C rGAAATTAAAGTTATTCAGTAGCAAAAACCCATCTTCCTCATCGAATAGCCCACA ACC fACGAGAAAGG IJ/U: 
CGGCGGCGGTGCCTCAACAACAAACTTTGrCGAAAArCGCTGCCCCAGTGAAAAGTriGCCTGAAGCCGCC 37A0 
GACCAGrAAGCTGGGAAGTGCCACGTCTATGrCGAAGCTTTGTACGGTGAGTATTTriAAArCGGAAArTG HflfsO 
3860 3870 3880 3890 3900 3910 3920 
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GAAA rGTA T TTTTTAAAAACTGAAAATTCTAC AAAATAAA I' I'AAA AA r AAGATTTT TTCCTCTCATAG I'A 30?0 
TTGCATCCCACTATTTTAC !' r rGAAGATTTATATCTTGGTTCATATTGAAGATATC AGATA I AG AAA A AG 3S90 
AAATAAAAAATATTTTGACAGTTGATAAT : " I" I' fG rAI'AGGACCAAAGACAAGTGAG ATATAAGCTGTCAA 40G0 
AGTTGATTTTCAAGAAATTTTAAAACCCTAGTTTTGCGAAGCTCTGGGCCTCATCTATATTTAnAACCCG 4 130 
A7TCGTAC I' IX !' iCCGT 7 CC 1TGAC TCTACCAAAACCAAAACCAACCTACTAATAA UATGATGAGACAA 42C0 
4210 4220 4230 4240 a2bO 'li! /O 
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TTGGGAAT fG I C i CCCA I I* I I C f IC I r IC VCC r I'GACAC I C T T l"C AGAATCTATGT :CCATCTTTTTGTC 4270 
U i l"G I"G'f"CCCCCCATAAAGACTCTTCGCGG AAAAATGTTGCAACGGAAGTGATATT :CGAGCA I' IT I rCG 4340 
ACGTO.IAGGCCCGAAAAACACA fCTGGCTG ACAAAGAGTAAAGCAAfT TC fCAGCT "TTCTTCGCCGGTT 4<*!o 
T i TC AATTC GT TTTTCA AAA TGAGCTACTAC AG AG TGAAAGAGC AC AAATTGC AAA \CATT IT IG I'G 1 GA 't'JQO 
iiA I'GCAC TTTTTGAAAAATTAACTTTACGTTT TCAG T TTC TAGTATTTATTTTTTT 1ATAT AAATTAGAO 

4560 4570 4580 4590 4600 4610 4620 
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C i" I C r I'AGACC ! GC J'A I'A I' f 1* I' f I'AAA ACT I'CC TAC fG A A A T A T A C G AG A f IC I I" i' I'GAC f r i'CCGGAAT 4620 
I'G TC vrArGGCTTCTArrArTTATGASAAAACATTTTTTAAAAATTTTTTTGAAAAUAAACGrsCArC 4830 
TTCG7TTTTTACATAGTAATTTCCAGCCAAAAGTTTCCTACCGTAAAACGGACGCC ;CAATCATATC T CA 4 /tfO 
AC A AG AC TCGAAACGATGC TCAAAGAGCAG "GAAGAAGAG I'CCGGA IACGC IGGAC rCAACAKCACG rCG 4flX 
CCAACGTCATCATCGACGGAAGGTTCCCTAAGCATGCATTCCACATCTTCCAAGGT rCGTTGTTTAGGAO 
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AAC7C777777777G777TGG 7GACCT FCACA7AG 7C7CGGA76777A7AAAAGTGAGG FC FC7G0GACA 4970 
CCTGCCATAAAATG T GAA TCCGCCC A V "* FG r rGGTACAAAAAACTTTGCAGAGCACC TGrr rTATAC ATT Rijtr 
TTTAGGATAAA l"G 1 C A I'ACGG TAT F7GTCAAACCC AAAC77777AAA7777A F IV ITAGA7CAAAAA7TG <51 C 
A FG rTAAAAGTTTAAGATATTTACGAAAAAATGTTTTACTTAAAAC r T rTTTATATGGATAAAATTTTAG 6 10c) 
AACA77CAGA FAGGAG77CCG7CCC TAAAAC7777G7GG7CGCCGAGAAGC777C6AA77AA7AA I C TCA Sl^O 

6260 6270 f>2B0 B?f)0 5300 5310 WAQ 
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]* f | |'A rAAG rTCGAAACTAATrTTTTGTGCAACTG AATTT l". r A GAG ATGA AAC TTT AA G T T I r.AA FT7A7 
CCCA I' r rGAAACCGTCCCCTTCTATAAAACTTCAAAAr F r TCAGAGTTCAACGTCAGACGAAAAG7CTCC b'jyo 
GTCATCAGACGATC I' I'AC rCTTAACGCCTCCA7CGTGACAGGTA7CAGACAGCCGA7AGCCGC AArArrG W< 
G : :*rCTCCAAA7AT7ATCAACAAGCCTGT7GAGG7GAGTATT7TTTGT7TCTGGG r AGAGGCT7C77G TC 5*nfi 
AAG777GGCC7AAA77A TAT I'AAC I" rGGT77AGAGGCTGGCAAAGCCAT7GA7CAASCATG6GCTAAAC r 5600 
5610 B6» SH30 5640 5650 5660 btf/O 
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GGGCCGGCCTGAACCA7GTACA l'A!TC rTTGCCCCGAGTAGTTGCAATCTAAAGAT rCGAAGCTGGCHC 5670 
AAAG ICGGAC7AGGCAAAAG7GCAAAAA! GGAAAA FATC rTGAAATTCAATACGC7 r TCCG7C777CC A7 57W 
rC77CC7T7777GTGG7C7777GT7GCAGA7777CCC777777AGA777rAAGA77T7GA7Ar;AC77i AA 58m 
70 TC7GGC f FCGCCTTCC7AAGAGCC7TCC 7A7AT V r PCGAAAAA7AA7C AA7T77 TAGGAAA AACCAAC 'itibH 
A 7 G G C A G 7 G A A A G G AG T G A A A AG C AC AG C G A A A A A AG A 7 C C A C C 7C (? AG C T G 7 7C I G CC A C G 7G A C A C C" 5QG0 

5960 B070 59AO 5990 6000 6010 6020 

» 1 ■ 1 1 i i « > I i t i i l^i j • t i i ■ t i * » * ■ 1 i ... i ...» 1 .... i .... I .... i .... i , . . . t . ... i 

CAGCCAACAAfCGGAGr mrrACiTCCAATTATGGCACATAAGAAGTTGACAAATGGrACGTTTATTTC I'G 6020 
AAC I I ! AC f I'A FG rTTCGGTCGGTGACSTTTTTCTTGACCATCiTCiATtSG^ AAG TAA 1T777GGA77A777 6090 
AAAS7TTG7GGGGGAA TAG f AAGGAGG AG7ACAA77A7777A777G7AGA AGG7TC AG TAAAACTTTGAT 6160 
r r F7C7GACCA7AAG777777T TC FGAAAG T 7G T77AAAAAA77CAG77A AAAAA7 ^AAA I AA I'AC FC 7 A 62»> 
7AAAAAI* f ! C I AAA F7C 7 7GGGAA7777777CAAAAA7G77777CCAAA f'A TG TC 7 ^A7AG7AAGA777C 6300 

6310 6320 6330 6340 6350 6360 6370 
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Ft' FG7GAA777ACAAAACA7A7777AAAA i'ACATTT I'A AT F FA F FC6A777777CG F7CGCAGGAACG AC 6370 
AAAAAA7CAGAAAAAGCGAAA7T7AA77CC AAAAAAAA7A7777GAAAAC FCACAA % TAAAGC7ACTTT7 bUa; 
CAAAAAA TCAACAAAAAAAA7A7CAAAAGA77CA7A7777CAGAAA7AGiiGACA7A ;rCAAAAAC I'A FCA 66 IC 
AAAAA77CAC7A7777CCGGAAAAAAA77G AGAAAAA7TCCAAAAA CFG f AAA AAA ^AAAA77GAGAAAA BMiO 
A 7 I'CCAG AAA7TGAAAAAAAATTCCTTTGAGGAAAATTTA AAAATTTTAAA7GTGT TiATTTCTGAA ACCA 66F0 

6660 6670 6680 6690 6700 6710 6720 
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AGCA77T7CCGAC7777C5GCGAT777CAGAC77GGGCTA7AAA77777G TCAAAA '1 f AGGAA I C F TAAA 67?0 
Ai AF rTGTATTTTTCCAAGAATGCTCCTCAATCTCAAAT7CATATTTTA7AATTTr. \GACCCCGTGATAT 6 /UO 
C i GAAAAACCAGAACC "G AAAAGCTCCAA TCAATGAGCA TCGACACGACGGACG 7 T JCACCGC F7CCACC 6800 
7C7AAAA7C AG77G77CC AC77AAAATGAC7TCAA7CCGACAACCACCAACG7ACG VTGTTCTTCTAAAA 0020 
CAAGGAAAAATCACA7CGCCTGTCAAGTCG T TTGGTCAG7GCACCCCCCACCTCCAMTATTAT'oACAAA 7000 
7010 7020 7030 7040 7050 7060 7070 
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TGACCATTTTGCAGGATATGAGCAGTCG7CCGCGTCTGAAGACTCCATTGTGGC"C ATGCG TCCGC T CAG 7070 

G I GAC7CCGCCGACAAAAAC7 7C7GG I'AA I'CAT FCGC TGG AGAGAAGGAFGSGAAA ;AA FAAGACA • GAG 71 

G I' AAA f F7 TGGAAAC TT FGA T FT F T F FTG f TGAAA AATAGCT7CAAA77ATAAA7T rTAAAAAATCCCAG W C 

AA A A A 76 AT G 7 7 7 G T C A A AG 6 A A A A C 7 77 7 G 7 7 7 7 77 GG 7 F 7C T G A A C T G r 7 FCG r f FA A AG r FA AC C A 7?P0 

ACG T7GGAGC FCG IACCAAAAAC77777C7T77^A7AA77 777GAA7C7A7AG7A7 TT t A A 7 T "7GG A A /3:;0 
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TTGTGAAAG I" (C CCTTGAGATGTAT TAAG I T ITAGGCA7AGGCAGG I'GTGTAGGC/ GAAA6G UTCATG r 7'|?C 
AGGC AGATAGGCTTGA I A ITTACCAAGCCAATAAACACi TAAATAATATTTAAAAA/ AAACACl'GAATAAA 7400 
ICAAAAGCTAATAAI I AT rGTTTATTGGACCTACCAACACCTTACATTTGCCTAC/i J'GCTTACCTA X ! CC 760O 
ITGTTGTCTACAlTTTGAACGTTAATCACTAATTCnGTGAArGAACACTTGTAGAI r TTTAATTTCGACA /fi'ttj 
G "AATTTTTGAGCACATTGGCGTTAGAA IT TGAAAAAAAACCTTCRACACTTGAAT CCTCATAACTC I CA 770O 
7710 7720 7730 //HO 77SO 7/fc>0 /770 
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AAATA CTTCAGAATCCAGCGGC TACACCTCTGACGCCGG TGTTGCGATGTGCGCCA AAATGAGGGAGAAS 777C 
CI GAAAGAA rACGATGACATCACTCGTCGAGCAGAGAAC.GC.rrATCCTGACAAG fi AGTTTTGG I'ACj TAG 7H'\Q 
TAGTTGTA6TCCCTTGACACACATATGAACACA rrCGCTGCTCGTTTGGGTGGTCAGGGAGCCATGGAGC 7910 
AATTAATCC AGAAGGCTCAAAATTAAi'GAGCATCACTTGGTGA rCGAGGAATCCCCGAAAGACG": | TGAT 70IJO 
AGCATCTTCTTC ITTTGCATTCTTT !' I'CTCTCTCTGCTGGGAG rCCCTGTTACACAGACATCTATTCA TG 8060 
wuv/ uw/w ovau oiw tti iu [UW 
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CGCoAAGTGCAAT'f I' FGGTTCCTAAAGATGAGAGGAGGAGGAGGAGCTATGAC F T A TG A A T G A GC A T C Ci A 3 1?0 
CGAACCGCQT0AAATAGTTTQGAGCTTA6ATTGCAAATTACAfi6AATATTCC6G FCiACCTCAGTC rAC FT 61 W 
GGCATTTGGCGGCAGTTTGGTAAAACrTGAGCCAAATTTrGrTTGGGTCACGCGTAATTTTCAAACAGTC U2SQ 
CiC AAGAATTTGCTCAACAAG TCTTGCTGGCCCAAG TC 1TTCTAGAGTC rG rACCAASCTTGGTCCAATAC 8330 
I I rrrGGGCCAAACTTTGGGAAGAAGTTGCCGCCAAATGCATTTTTTCAAAAATACTAGAACCAACATC^ O'lOO 
8'MO M20 UTJO 3440 84*30 8460 0470 

1 1 1 ' 1 ' ' 1 » I 1 * 1 t I < 1 1 1 t i.t.l 1 t » 1 1 1 . . . I . ■ . ■ I , , . > I ■ . . t I ■.ill ! , , , , j 

GGTTTAGCGCAAGTTGTCTCCTAGGAGGATATCCAAAFGT f FATTTACTCTC TCTC TTTC T A ATO A AG AT 8 '1 70 
CTArCGGTGCATCTCArAATGGAACCGATGCGGTTTTGCGCGCrrrGAAGAGCATl I'CTTTGCTTT i'G I I* 3^40 
GC rrrrTTGTTATGCCTCTGAATTTGAAACTAGTTATGAAArrTCAGATGTrrCACTGGAGArGGCCAAG 8610 
ArAGTTGGTATTACGAAAGTTTCATGAArTTTGAAATTCTGTAGTTCTGTGAGTCTjCArACTATAGCrc A6KO 
AAAAAGGTTCCGCTTGAGCCCTGCCTAAATTCAAAATTTCCrTTCAAGTCCrGTCCATTTAAr (TACCiTT 8750 

0760 tt//0 8780 A790 6800 8810 HH'M 

■ ■ 1 t M 1 1 1 1 I 1 1 1 1 I 1 1 1.1 I . . ■ ■ 1 » . . . 1 . » » ■ 1 . . . ■ 1 > » . . 1 . > . 1 I 1 1 « . 1 .... i .... 1 .... 1 

AFGAAATAACGCGAACC rATCATTCTTCT TCGTCTCAATTTC rTTTTCATGTCACC ICGATCC I G rGGGT 88?0 
TTCAATCTrcrTTCAATTCACCCTAGATTTGGCGCACAAGCTCCCCrCTCTCTCTAXrGCAGCTGrcCTC Wi90 
nCGCTTCCATTT I'TGGCGACTTGC I' rCCTTCCCTTCCCTTCCGTCGCCGGTTTCCT I'CXGATTGCC r FGT HWQ 
G r rCCTATAA r FAATTCATTAAGCCCAGAACAAACGAACGGGG fTTCTTTTTC ICC rCTTACC ATCTTTT 9030 
TCGCGGAATTGAA'TCGT IT TGTGATAGTGAAG TGTTTGTCAAGAATTTTTGG I" XT rTGGTAGC I' FGCCA U100 

9 MO 9120 9130 9140 9150 Q1G0 0170 

1 1 ' 1 1 1 ' ' I 1 1 ■ 1 1 1 1 ■ 1 I ■ 1 1 1 1 ■ ■ ■ 1 ■ . . . t . , . , t . ... 1 .... 1 . . . > i .... 1 .... 1 

AA rTATTGTTAGTATGCACCATGTTSCTCAGCATTGTGTTCAA FATCTTGGTT rTATTTCAAAATTTG T I" 9170 
VC A TAAAAAC T F FFGTAGCC AGG F F rTCGTGGCGGGACCCACTTTTrCAAACTC fAiiCCAAAAA I G FGTG 40 
ICAATTCGSATCTCAAArrATCTCAACTATAACCTTAGAACGTTTTTTrCAAAAAT-TCCGAAATCArAr It; 
TT , p rCGAAATTCCAATT f FTTTAAAGAT rAAACTGCCAAAT f I GAATTCCA TGaiCCTTCC A I* TTQACTC 93B0 
CCATCATGAC I ACGTTCCGTGGG r^TCGCCACAAAAAfTACAGTTCC I'CGGAAGGT" 1 I fCrGGCGGGGC OUSO 
S*i60 f jn70 9M80 JJaSO 9500 us 10 m?.o 

I ' ' ' J 1 ' ■ ■ ■ 1 ■ * ■ 1l I ^ ' ■ ■ t ■ ■ ■ ■ 1 " ■ ■ I 



C • AGCAG AGCCCATAG6 rGCCAAAGC I'CXCGATGAGC rGATAAAAAATGTACT I'C r^AGGAA 1* A r TATGC 05?O 
AAAAA rAATrTGGAGCTTTTTTACTAGAAACAGrTTAAGAAAGAAAAAAG5rTTTT"T I i AA TTTAAAAA 9600 
rTCAAATTTTGAATATTAAAGCCAAATTTGAGTTGATCAC ITCGAGAAAATTCAAA*,AT CGAAAGCC TAG 0600 
ATT i T FGAl GCAA'riTATCTGGAAAGCGAACTTCTAAA T TAGAAAAAC !' FA TAAAA^AATCTAAA TGTTT "9730 
GAAATTTTTTATTrGAAATCCTAG f-GACTTTTT f rGATTTTTCTATTTGTTTCCAAGCTA FAAGTCT i F 080<) 
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9810 9820 9830 98<K> 9850 9860 Otl/0 

■ i i i i i i I i i i i i i i i 1 1 i i t i r i ■ i i i . i i i i ■ . .■ > i i ■ . ■ i » > . > i . , . . i . , , 1 1 , . . . i . . . . i 

: jAAGTCGCCTCCTCAACCTCAAACCA3TGTGCCTCCATATTTGGAACACACACAAGCAAAAACCAA i I GA 9870 
TACTATGTG : TCGAGTAGCCACTTG ACAAGAAGAAACTTGCCGACAC I'GG fGGCTGG TCACCATTCTCCT 9940 
CTCTTTGTCArrrSCArAAlXTTrCTCCCTCTTCCTCATAAArAACTAAACTGTGTCTCCTOCGCGrrrr 100 ! 0 
CCGCTCTCGAGGGGGGGCCCGGTACCCAGCTTTrGTTCCC IT TAG fGAGGGTTAATTGCGCGf TTGGC G I 10080 

aatcatggtcatagctgtttcctgtgtcaaattgttatccgctcacaattcc:a(:ac:aacatacgagccgg ic;so 

10160 10170 10100 10190 10200 10210 10220 

' ■ ' ' 1 ' ' ' t.M.ti i.i L.iL.L-i 1 1 . 1 1 1 > . . i i . , . . i . . i . ... i .... i , .... i .... i . . . . r 

AASCATAAAGI'G I'AAAGCC rGGGG rGCCTAATGACTGAGCTAACTCACATTAATTGCG T fGCGCTCACTG 10SW0 
C C C G G TT TC C AG TC G G G A A A CC T G TCG TG C C A GC TGC A 7 T A A I* G A A TC GG C C A AC G CG C GG G G AGAG G f S G 10^90 
GTTTGCGTATTGGGCGC fC !' I'CCGCTrCCrCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGC IGCGGCG lO^Br, 
AGCGG I'ATC AGCTCACTCAAAGGCGG T MTACGGTTATCC ACAGAATCAGGGGATAACGCAGGAAAGAAC lO'ho 
A T GTGACCAAAAGGCCAGCAAAAGGCC AGGAACCG TAAAAAGGCCGCGTTGCTGGCG 1 I I i rccATAGGC 10!>00 
10610 10520 10530 10540 106S0 10560 10570 

' 1 ' 1 I I I I I I I I I I 1 I I I t I I I I I I I 1 I ■ I . . . . 1 , . . i I . . . , I , , . , I , , , , 1 r j t t | , . . . 1 . ... I 

TCCGCCCCCCTGACGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTAI A U'^m 
AAGA rACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTC !"CC rGTTCCGACCC TGCCGCTTArCG^A 108U0 
! ACC rCTCCGCCTTTCTCCCTTCGGGAAGCGrGGCGCl' rrCTCATAGCTCACGCTGTAGGTArciCAGrT 1071M 
CCGTGTAG6 ICG F rCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCG T rCAGCCC-IACCGCTGCGCC r f 10780 
A~CCGCTAACTATCG r C TTGAGTCC AACXCGGTAAGACAC0ACTTATCGCCACTGGCAGCAGCCACTG6 I* 10050 
10880 10870 10ftn0 lOAflO 10900 10910 1OW0 

' ' ' ' I ' *■« i I i i i i I i i i i I i i ■ i I i i i i I t i i i Xll i.iJ.. u . » » i i ■ * . ■ 1 ■ ■ i ■ i . . . ■ I . » . . I . . . . * 
AACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGC f AC AG AGTTC TT5AAGTGGTG 3CC I AAT rAfGCCT 10<J20 
AC ACTACAAGGACAG I A V 1* rGGTATCTGCGCTC TCCTGAAGCCAGTTACC I* fCGGA AAAACAGTTGGTA6 10<W0 
C i'C fTGATCCGGCAAACAAACC ACCGC I'GG rAGCGGTGGTTTTTTTGTTTGCAAGC AGCAG ATTACGCGC 1 lOBO 
AGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCrGACGCTCA iTGGAACGAAAAri 1 mo 
CACGTTAAGGQATTTTGGrCAfGAJiAr TA TCAAAAAGGATCTTCACCTAGATCC H f f AAA TTAAAAATG I 1200 
1^10 11220 11230 112*0 1)250 11260 11770 

* ' 1 1 1 1 1 ' 1 1 1 » ' ' * i t i i I i.i i i ! i i i i I i i i i I i i i , I i , i i | i i i , | , > i i.L.l. t » 1 ... . I .... ? 

AAGTTTTAAArCAArcrAAAGTATATATGASTAAACTTGGTCTGACAG rrACCAATXTTAATCAGTGAG 1 !?70 
6CACCTATCTCAGCGATCTG I'C ! A I rTCGTTCATCCATAGTTGCCTGACTCCCCGT :GTGTACATAAC l"A 113'^. 
CGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCArGCTCACCGGCTCC 1 \^\0 
AGATTTATCA6CAATAAACCAGCCAGCCGGAAGGGCC6AGCGCAGAAG f G G rcCTG lAACTTTATCCGCC 1 1'lftf. 
ICCATCCAGTCTATTAATTG I' I'GCCGGGAAGCTAGAGTAAGTAGTTCGCC AG 1TAA rAGTT 7GCGC AACo 1 1GCO 

II56C 11070 M580 11590 H600 11610 11620 

1 " 1 1 1 1 1 ' 1 1 1 ' • ' 1 1 1 ^ j * 1 * ^ ■ « < ■ i i i i ^ t . . . . t .... I .... i . . . j ] t ,1,1, , , t j_ 
rrGfTGCCA rrGCTACAGGCATCGTGGTGTCACGCrCG rCGTTTGGrATGGCTTCArrCAGCTrcnr.TTr 1 16?C: 
CC A AC G A T C A AG G C G AG T T A C ATG A TC CC C C A TG T I'G fGC AAAAAAGCGGTTAGCTCC i I*C(iG FCCTCCG 1 1090 
ATCGl (G rC AGAAGTAAGTTGGCCGCAGTG 1TATCAC I CA rGGTTATGGC AGC ACT'if A ;'AA I rr rrTTA I I /(in 
C I G rCATGCCATCCGTAAGA I'GC T T FTCTGTGACTGGTGAGTACTCAACCAAC FCA rrcTGAGAATAG I C 1 WOrt 
T ATGCGGCG ACCGAGTTGCTCTTGCCCGGCG I (J A A . r AC G G G A T AA TAG C G C GC C AC A r A G C AG A AC T T T A 1 1900 

*'SIO 11920 1103O MOW 11950 11960 ;*970 

■ ■ 1 ' ' ' ' > ■ I ■ t i i t t t i i I i i i , i . . , , I , . , . i , . , . i , . . . i . . . . i . . . . i . t i .... t .... i 

AAAG TGC TC A TC A T TGGAA A ACGTTCTTCGGGGCG AAA AC TCTC A AGG A TCTT ACCGC TG T I'G AG A rrPA 1 1<J/0 
C: TCGA I G I AACCCACTCGTGCACCCAAC i'G A rCTTCAGC ATCTTTTACT J TC ACC . \G C G T TT r TG GG I'G 1 ?C'*0 
AViCAAAAACAGGAAGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAAnii CGAAT ACTCATA ^I'O 
C.C rrCCTTTTTCAAiATTAf ! GAA6CATT TATCAGCCTT ATTGTC TCA rGAGCGG.M ACATAT f TGAAT 1210 J 
(j A! rrAGAAAAATAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCAr I2'>37 
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s (FffferdiD 



Enzymes : 100 oM46 enzymes < 

Settings: Linear. Certain Sites Only. Standard Genetic Code ^ 

GACGGATCGGGAGATCTCCCGATCCCCTATGGTCGAC TCTCAGTACAATCTGCTCTGATGCCGCATAGT TAAGCCAGTATCTGCTCCCTCCTTGTG TGTT 

crGCCrAGCCCTCTAGAGGGCTAGGGGATACCAGCTGAGAGTCATGTTAGACGAGACTACGGCGTATCAATTCGGTCATAGACGAGGGACGAACACACAA ^ 
T 0 R , E 1 S , R 5 p M V 0 S Q Y N L L . C R IVKPVSAPCLCV 

GSAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTAC AACAAGGCAAGG 

CCTCC AGCGACTCATCACGCGCTCGTTTTAAATTCGATG TTGTTCCGTTCCGAAC TGGCTGTTAACGTACTTCTTAGACGAATCCCAATCCGCAAAACGC ^ 

ggr .v vre on lsy nk arldr q lheesa.g.afc 

CTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGGGTCATTA 

gacgaagcgctacatgcccggtctatatgcgcaactgtaactaataactgatcaataattatcattagttaatgccccagtaatcaagtatcgggtatat 

A A S R C T G Q I Y A L T L I i D . LLIVINYGVISS.PIY 
TGGAGTTCCGCGTTACATAACTTACGG 

acctcaaggcgcaatgtattgaatgccatttaccgggcggaccgactggcgggttgctgggggcgggtaactgcagttattactgcatacaagggtatca 400 

G V P R Y I T Y G K V P A V L T A Q R P PPIDVNNOVCSHS 

aacgccaatagggactttccattgacgtcaatgggtggactatttacggtaaactgcccacttggcagtacatcaag 

rTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTGATAAATGCCATTTGACGGGTGAACCGTCATGTAGTTCACATAG TATACGGTTCATGCGGG ^ 
NAN ROFPL TS MGG LF TVNC PLGSTSSVSYAKYA 

TGACGTC AATGACGGTAAATGGCCCGCCTGGCAT TATGCCCAG TAC ATGACC TTATGGG ACTTTCC TACTTGGCAGT AC ATCT ACG T ATTAGTC A 
GGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCATGTACTGGAATACCCTGAAAGGATGAACCGTCATGTAGATGCATAATCAGT 
PY * RQ ' R , -^A RLALCPV HPL MGLSY L A V H L R I S H 

TCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATC 

AGCGATAATGGTACCACTACGCCAAAACCGTCATGTAGTTACCCGCACCTATCGCCAAACTGAGTGCCCCTAAAGGTTCAGAGGTGGGGTAACTGCAGTT ^ 
R Y Y . H G 0 A V L A V H Q V A V I A V . L T G I S K S P P H . R Q 

TGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAAC TCCGCCCCATTGACGC AAATGGGCGGTAGGCGTGTACGGTGGGA5 

ACCCrCAAACAAAACCGTGGTTTTAGTTGCCCTGAAAGGTTTTACAGCATTGTTGAGGCGGGGTAACTGCGTTTACCCGCCATCCGCACATGCCACCCK ^ 
* E F V L A P K S T , G L S K M S . Q L ft P 1 0 A N G R . AC T V G 

^ATATAAGCAGAGC TC TCTGGCTAACTAGAGAACCCACTGCTT AC TG3CTTATCGAAAT TAATACGACTC ACTATAGGGAGACCCAAGC TGGCTAGC 
CAGATATATTCGTCTCGAGAGACCGATTGATCTCTTGGGTGACGAATGACCGAATAGCTTTAATTATGCTGAGTGATATCCCTCTGGGTTCCACCGATr': ^ 

i > , 

l T7 promoter priming site — — i 

G L Y K Q S S L A N .■ » T H C L L A Y R N . Y D S L . G D P S V L A 
GTTTAAACTTAAGCTrACCATGGGGGGTTCTCATCAT^ 

CAAATTTGAATTCGAATGGTACCCCCCAAGAGTAGTAGTAGTAGTAGTACCATACCGATCGTACTGACCACCTGTCGTTTACCCAGCCCTAGACATGCTG ^ 

E- ■ K 
ProBond binding domaJn -ill 1 
P « L < L T H G G S H H HHHHGMASMTGGQONGROLYO 
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Page J 



GATGACGATAAGGTACCTAAGGATCCAG TGTGGTGGAAT TCTGCAGATATCGAATTCCTGCAGCCCCT GCTCTTCAGCCAGATGC TGGACCC AGAG TCCII 

c tac tgc tattccatggattcctaggtcacaccaccttaagacgtctatagcttaaggacgtcggggacgagaagtcggtctacgacctgggt--*tc*gg'- 

> ■ 



-insen plmt~ 



-ORFpLMl 

D 0 0 < V P K 0 P V V V N S A 0 I E F L Q PLLF S0MLDPE3 

agagaaagaggacagtgcagaatgtcctggatctccggcagaacctggaagagaccatgtccagcctgcgag ggtcccaggtgac ICACAGCTCCC TGGA 
tctctttctcctgtcacgtcttacaggacctagaggccgtcttggaccttctctggtacaggtcggacgctcccagggtccactgagtgtcgagggacc: 



-ORF pLMl 



qr krtvqnvl dlrqnleethsslrgs o v t h s s l e 
gatgacctgctacgacagcgatgatgccaacccacgcagcgtgtccagcctctccaaccgctcgtcccctctgt^ 

ctactggacgatgctgtcgctactacggttgggtgcgtcgcacaggtcggagaggttggcgagcaggggagacagtaccgcgataccggtcaggtcaggc 



-insert pLM1 



"ORF pLM1 



M T C y 0 5 0 D A NPRSVSSLSNRSSPLSWRYGQSSP 



cggctgcaggctggtgacgcgccctctgtgggtgggagctgccgctcggaggggacgcccgcctggtacatgcacgg cgaacgggcccactactcccac. 

GCCGACGTCCGACCACTGCGCGGGAGACACCCACCCTCGACGGCGAGCCTCCCCTGCGGGCGGACC ATGTACGTGCCGCTTGCCCGGGTGATGAGGGTG" 



■insert pLMl 



-ORF pLMl 



R L Q A G 0 A.-PSVGGSC RSEGTP AVYMHGERAHY 



S H 



CCATGCCCATGCGCAGCCCCAGCAAGCTCAGCCATArCTCCCGCCTGGAGCTGGTCGAATCCCTGGACrCGGArGAGGTGGACCTCAAGTCCGGCT A CA" 

ggtacgggtacgcgtcggggtcgttcgagtcggtatagagggcggacctcgaccagcttagggacctgagcctactccacctggagttcaggccga TGT J 




TMPMRSPS 



-ORF pLMl 

* L SHISRLELVESLDSDEVO 



L K S G Y F 



GAGCGACAGTGACCTCATGGGCAAGACC ATGACGGAGGA rGATGACATCACTACCGGCTGGGATGAAAGCAGC TCCATCAGTAGT3GAC TCA GCGAfGCC 
CTCGCTGTCACTGGAGTACCCGTTCTGG TACTGCCTCCTACTACTGTAGTGATGGCCGACCCTACTTTCGTCGAGGTAGTCArCACC TCAGTCGCT ACG" 



-insert pLM1 



■ORF pLM1 



3 0 S D L M G KTMTEODOITTGVOESSSISSG 



L S 0 A 
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pLM3 (1 > 10847) Site and Sequence ^ 

TCAGACAATCTC AGTTC AGAAGAATTCAATGCCAGCTCCTCACTCAAC TCCCTCC CAAGTAC TCCCACTGC TTCTCGCAGGAACTCAACAATAG TGCTAC 
AGTCTGTTAGAGTC AAGTCTTCTTAAGTTACGGTCGAGGAGTGAGTTGAGGGAGGGTTCATGAGGGTGACGAAGAGCGTCCTTGAGTTGTTArc AC GATS 



■insert pLMl 



-ORF pLM1 



S 0 N L S S £ E F N A SSSLNSLPSTPTASRRNST !VL 

GCACAGACTCAGAGAAGCGCTCACTGGCAGAAAGTGGGCTGAGCTGGTTTAGTGAATCAGAGGAGAAAGCCCCTAAAAAACTGGAGTACGACAGTGGTA3 

' ■ 1 1 ' 1 ■ 1 'I 11 ' 1 f 1 I — > ■ 1 1 ■ t 1 , 1 1 1 |g,v 

CGTGTCTGAGTCTCTTCGCGAGTGACCGTCTTTCACCCGACTCGACCAAA TCACTTAGTCTCCTCTTTCGGGGATTTTTTGACCTCATGCTGTCACCAT" 



■insert pLM1 



R T O S E K R S L A E S G L S V F S E 5 E E K A P K KLEY0SG3 
CCTGAAGATGGAACCTGGGACTTCTAAGTGGCGGAG GGAGCGGCCTGAGAGCTGTGATGATTCATCCAAGGGTGGAGAACTGAAAAAGCCCATCAGCCTG 

' " * t ( ■ ■ II . » . ■ 1 . I, t I |,|| , f , A t - - I t I I J »3(V 

GGACTTCTACCTTGGACCCTGAAGATTCACCGCCTCCCTCGCCGGACTCTCGACACTACTAAGTAGGTTCCCACCTCTTGACrTTTTCGGGTAGTCGGAC " ' 



-insert pLMl 



-ORF pLM1 



I * M E P G T S K V R R E RPESCODSSKGGEL K K P ISL 

GGCCACCCTGGTTCCCTGAAGAAG GGCAAGACCCCACCTGTGGCTGTAACTTCCCCCATCAC TCACACAGCCCAGAG7GCCCTCAAAGTCGCAGGC 4AAC 

1 ' "— ' 1 » " " t ■ i.i ■ 1 ■ ■ ■ t . . . . - - ■ 1 ■ | 1 : I,, [ -i:y\- 

CCGGTGGGACCAAGGGACTTCTTCCCGTTCTGGGGTGGACACCGACATTGAAGGGGGTAGTGAGTGTGTCGGGTCTCACGGGAGTTTCAGCGTCCGTTTG 



-insert pLMl 



-ORF pLMl 



G H P G S L K K G K T P P V A V JSP I T H T A Q 3 A L K V A G T 
CrGAGGGCAAAGCTACAGACAAGGGTAAGCTTGCAGTGAAGAATACTGGGCTCCAACGCTCCTCC TCTGATGC TGGTCGGGACCGCCTGAGTGA7G:7AA 

gactcccgtttcgatgtctgttcccattcgaacgtcacttcttatgacccgaggttgcgaggaggagactacgaccagccctggcggac rcACTACtiAT T " ,iA 



-insert pLMl 



"ORF pLM1 



AGRDRLSD 



p £ GKATOKGKtAVKNTGLORSSSO 

GAAGCCCCCCTCGGGCATTGCTCG CCCCrCCACTTCGGGATCCTTCGGCTACAAGAAGCCTCCTCCTGCCACAGGCACAGCCACTGTCATGCAAACTGGT 
1 11 1 1 l 1 1 1 1 1 1 1 1 1 ; , , 1 

CTTCGGGGGGAGCCCGTAACGAGCGGGGAGGTGAAGCCCTAGGAAGCCGATGTTCTTCGGAGGAGGACGGTGTCCGTGTCGGTGACAGTACGTTTGACCA 



-insert pLM1 



ORF pLMl — 

K P P S <* IARPSTSGSFGYKKPPPATGTATVMQTG 
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GGTTC AGCCACTCTCAGCAAGATCCAGAAGTCCTCAGGC ATCCCTG TCAAGCCAGTAAATGGGCGCAAG AC TAGCTTAGATGTTTCCAACAGCGCAGAGi; 
CCAAG T"CGGTGAGAGTCGTTCTAGGTCTTCAGGaGTCCGTAGGGACAGTTCGGTCATTTACCCGCGTTC TG AT CGAA TC T AC A AAGG TTGTCGCGTCTC3 "'' 



■insert pLMl 



-ORF pLM1 



GSATLSK IQKS5G IPVKPVNGRKT 



SLOVSNSAE 



CAGGATTCCTGGCTCCrGGAGCCCGTTC TAACATCCACTACCGCAGCCTGCCCCGGCC AGCC AAGTCAAGTTC TATGAGCGT GACCGGCGGGCGGGGTGc 
GTCCTAAGGACCGAGGACCTCGGGCAAGATTGTAGGTCATGGCGTCGGACGGGGCCGGTCGGTTCAGTTCAAGATACTCGCACTGGC ^ 



-ORF pLMl 



P G F L A P G A R S N J Q Y R S L P R P A K SSSMSVTGGRGG 

''''''' 1 " ' ' ' ■ 1 11 ' — 1 

ACCTCGCCCTGTGAGCAGCACCA^ 

TGGAGCGGGACACTCGTCGTCGTAACTGGGGTCAGAGGAGTCGTGGTTCGTCCCTCCGGAATGCGGAAGGTCTGACTTCCTC6GATGGTTCCATCGGTCA 



-insert pLM1 



-ORF pLM1 



P R P V S S S I D P 5 L L S T K Q G G L T P S R L K E P T k V A -3 
GGGCGGACCACTCCAGCCCCTGTCAATCAGACAGATCGGGAAAAGGAGAAGGCCAAAGCCAAGGCAGTGGCCTrGGACTC AGA^ 

CCCGCCTGGTGAGGTCGGGGACAGTTAG tctgtc tagcccttttcctcttccggtttcggttccgtcaccggaacctgag TCTGTTGTAGAGGAACTTCT 



-insert pLMl 



-ORF pLMt 



G R r T P 



APVNOTOREKEKAKA 



KAVALD30N I S L ^ 



gtattggctccccagagagtactcccaagaaccaagcaagccaccccacagccaccaagctggcagagctgccaccaacccctctcagggcc acagc 

CATAACCGAGGGGTCrCTC ATGAGGGTTCTTGGTTCGTTCGGTGGGGTGTCGGTGGTTCGACCGTC TCGACGGTGGTTGGGGAGAGTCCCGG TG TCGC T ~ ^ 




■ORF pLM! 

3 ' G 5 P E S TPKNQASHPTATKLAELPPT 



P L R A T 



GAGCrTTGTCAAACCACCCTCACTAGCCAATCTTGACAAGGTCAACTCCAACAGTCTGGATCTACCATCArcCAGTGATACCACCCATGCTTCAAAGGT- 
CTCGAAACAGTTTGGTGGGAGTGATCGGrTAGAACTGTTCCAGTTGAGGTTGTCAGACCTAGATGGTAGTAGGTCACTATGGTGGGTACGAAGTTTCrA." 




SL0LPSS3DT THA 
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CC AGA TC TGCATGC TAC AAGC TC AGCATC TGGGGGCC CTCTCCCTTCCTGCTTCACCC CC AG TCCGGCACCCATCC TCAA TAT TA AC TCAGCCASC TTC~ 
OO'CTAGACGTACGATGTTCGAGTCGTAGACCCCCGGGAGAGGGAAGGACGAAGTGGGGGTCAGGCCGTGGGTAGGAGTTATAATTGAGTCGGTCGAAGA 



■insert pLM1 



-ORF pLM1 



SQL HATSS AS GGP L P SCFTPSPAP I L N 1N3ASF 

CCCAGGGCCTGGAGCTAATGAGTGGTTTCAGTGTG CCAAAAGAGACCCGC ATGTACCCCAAACTCTCAGGCCTGCACAGGAGCATGGAGTCCCTCC AGAT 
GG2TCCCGGACC TCGATTACTCACCAAAGTCACACGGTTTTCTCTGGGCGTACATGGGGTTTGAGAGTCCGGAC6TGTCCTCGTACCTCAGGGAGG TCTA 



-insert pLMl 



■Uhr pLMl 



3 Q G L E L M S G F S V P K E T R M Y P K L SGLHRSMESLGN 
GCCAATGAGCCTCCCCAGTGCCTTC CCCAGCAGTACTCCCGTCCCCACCCCACCTGCTCCCCCTGCTGCTCCCACAGAAGAAGAGACGGAAGAGCTGACT 

' ' ' ' — - — 1 i 1 * 1 ii i i i i i i i — tti t 3 1 y 

CGGTTACTCGGAGGGGTCACGGAAGGGGTCGTCATGAGGGCAGGGGTGGGGTGGACGAGGGGGACGACGAGGGTGTCTTCTTCTC TOCCTTCTCGACTGA 



-insert pLMl 



-ORF pLMl 



p _M S L P S A F P S S TPVPTPPAPPAAPTEEETEELT 

T'jCjAG TGGAAGCCCCAGAGCTGGGCAAC TGGACAGTAATCAGCGGGATCGGAACACTCTTCCCAAGAAAGGGCTC AGGTACCAGC TTCAGTCCCAGGACili 
ACCTCACCTTCGGGGTCTCGACCCGTTGACCTGTCATTAGTCGCCCTAGCCTTGTGAGAAGGGTTCTTTCCCGAGTCCATGGTCGAAGTCAGGGTCCTC: 



-insert pLM1 



-ORF pLM1 



" SG 5PRAG Q L OSNQRDRNTLPKKGLRYQLQSQE 

AGACC AAGGAGAGGCGACATTCCCATACCATTGGTGGGC TGCCTGAATCCGATGACCA GTCAGAGCTGCCTTC TCCCCCTGCACTTCCC ATGTCTC TGAH 
"CrGGTTCCTCTCCGCTGTAAGGGTATGGTAACCACCCGACGGACTTAGGCTACTGGTCAGTCTCGACGGAAGAGGGGGACGTGAAGGGTACAGAGACT: 



•insert pLMl 



-ORF pLMl 



ErkEaR "SHTlGG L PESODQSELPSPPALPMSL :: 

TGC AAAGGGCCAAC TTACCAACATAGTGAGTCCCACTGCGGCCACCACGCC AAGAATCACCCGCTCCAACAGC ATCCCCACCC ACGAGGCGGCCTTCGAC; 
ACGTTTCCCGGTTGAATGGTTGTATCACTCAGGGTGACGCCGGTGGTGCGGTTCTTAGTGGGCGAGGTTGTCGTAGGGGTGGGTGCTCCGCCGGAAoCTC 



insert pLM1 - 

" " " - ORFpLMl 

A K G Q LTNIVSPTAATTPRITRSNS IPTHEAAFE 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 118/270 PCT/EP97/06956 



Tuesday, 18 November 1997 13:58 

pLM3 (1 > 10647) Site and Sequence 

CTGTACAGCGGCTCCCAAA |GGGGAGCACCCTGTCCC TGGCCGAGAGACCCAAGGGAATGAT fCGG TCAGGATCCTTC CGAGACCCCACGGACG ATGTT^ 
GACATGTCGCCGAGGGTTTACCCCTCGTGGGACAGGGACCGGCTCTCTGGGTTCCCTTACTAAGCCAGTCC TAGGAAGGC TC TGGGG TGCC TGC TACAAU 



Page {, 



35: 




LYSGSQMGSTLSLAERPKGMIR 



SGSFROPTDO 



ACGGC TC ~GTGCTG TCCCTGGCCTCCAG ^GCCTCCTCCACCTACTCCTCAGCTGAGGAGAGGATGC AATCTGAGCAAATCCGGAAGCTTC GTAGGGAAC" 
TGCCGAGTCACGACAGGGACCGGAGGTCACGGAGGAGGTGGATGAGGAGTCGACTCCTCTCCTACGTTAGACTCGTTTAGGCCTTCGAA 



" insert pLMl 



' ~ ORFpLMI _ 

H G 3 V L S L A S S A S S TYSSAEERMQSEQIR 



K L R R E L 



GGAATCATCCCAGGAAAAAGTGGCCACCTTGACGTCTCAGCTTTCTGCCAATGCTAATCTGGTGGCTGCTTTTGAGCAGAGCCTGGTGAATA TGACATCC 
CCTTAGT AGGGTCCTTTTTCACCGGTGGAACTGCAGAGTCGAAAGACGGTTACGATTAGACCACCGACGAAAACTCGTCTCGGACCACTTATAC tgtags 



~~ ORFpLMI 

E S S Q E K V A T L T S Q L S A N ANLVAAFEQSLVNMT-i; 



CGCCTGCGACACC TGGCAGAGACGGCCGAGGAGAAGGACACTGAGCTGCTGGATTTGCGAGAAACCATAGACTTTCTGAA^ 

gcggacgctgtggaccgtctctgccggctcctcttcctgtgactcgacgacctaaacgctctttggtatctgaaagactIctttItctt^ 



-insert pLMl 



-ORF pLM1 



L^HLAETAE EK0TELL0LRE7I0FLKK 



K N 5 E A 



AGGCAoTCaTTCAGGGAGCCCTTaATGCCTCAGAAACCACACCCAAAG AACTTCGGATCAAGAGACAAAAC TCCTCAGATAGC A TCTCAAGCCTCAACAI- 
TCCGTCAGTAAGTCCCTCGGGAATrACGGAGTCTTTGGTGTGGGTTTC TTGAAGCCTAGTTC TCTGTTTTGAGGAGTCTATCGTAGAGTTCnriArtTrr. T" 



•insert pLMl 



— ORF pLMl 

Q A v IGGALNASETTPKE 



t-RiKRQNSSDSIS 



S L N 



CATCACT AGCCAT TCCAGCATCGGCAGCAGCAAGGATGC |G A ^*GCGAAAAAGAAGAAAAAAAAGAGTTGGGTC TATGAGCTTCGAAGTTCCTTCAACAAA 
GTAGToATCGGrAAGGTCGTAGCCGTCGTCGTTCCTACGACTACGCTTTTTCTTCTTTTTTTTCTCAACCCAGATACTCGAAGCTTCAA 



" — —ORFpLMI 

, S , H S S rGSSKDADAKKKKKKSVV 



YELRSSFflh 



BNSDOCID: <WO_9824810A2_I_> 



WO 98/24810 



119/270 



FCT7EP97/06956 



Tuesday. 18 November 1997 13:58 Page 7 
pLM3 (1 > 10847) Site and Sequence 

GrGTTCAGTATAAAAAAGGGGCCCAAGTCAGCTTCCTCATACTCGGATATAGAGGAGATTGCTACACCCGACTCTTCAGCCCCCTCATCCCCCAAACTAC 

—I — - — 1 i i t « 1 ■ ■ 1 • 1 1 1 ■ ■ ■ • 1- -1^: 

CGCAAGTCATATTTTTTCCCCGGGTTCAGTCGAAGGAGTATGAGCCTATATCTCCTCTAACGATGTGGGCTGAGAAGTCGGGGGAGTAGGGGGTTTGATG 



-insert pLM1 



-ORF pLMI 



4 F S IKKGPKSASSYSD IEEIATPDSSAPSSPKL 

AGCATGGTTCCACAGAGACTGCTTCACCCTCCATCAAGTCCTCCACCTTGTCCTCCGTGGGCACTGATGTCACCGAGGGCCCTGCTCACCCAGCCCCCCA 

t , , , 1 1 1 1 1 i 1 i 1 1 1 . 1 1 i 1 i 1 1 | 1 1 ■ 1 1 1 1 1 C^OC 
TCGTACCAAGGTGTCTCTGACGAAGTG6GAGGTAGTTCAGGAGGTGGAACAGGAGGCACCCGTGACTACAGTGGCTCCCGGGACGAGTGG6TCGGGGGGT 



-insert pLM1 



"ORF pLM1 



QHGSTETASP S I KSSTLSSVGTDVTEGPAHPAPH 

CACTAGGCTGTTCCATGCAAATGAGGAGGAGGAGCCAGAGAAGAAGGAGGTATCGGAGCTGCGCTCTGAGCTATGGGAGAAGGAAATGAAGCTTACAGAC 

, 1 i 1 ■ l 1 1 1 ' 1 ■ 1 ' l ■ 1 » ■ ■ i 430C 

GTGATCCGACAAGGTACGTTTACTCCTCCTCCTCGGTCTCTTCTTCCTCCATAGCCTCGACGCGAGACTCGATACCCTCTTCCTTTACTTCGAATGTCTG 



-insert pLMI 



-ORF pLM1 



TRLFHANEEEEPEKKEVSELRSELVEKEMKLTD 
ATCCGCTTGGAGGCCCTCAACTCTGCCCACCAACTGG 

TAGGCGAACCTCCGGGAGTTGAGACGGGTGGTTGACCTAGTCGAAGCCCTCTGGTACGTGTTGTACGTCAACCTCCACCTGGACGACTTTCGTCTCTTAC 



■insert pLMI 



-ORF pLM1 



[ RLE ALNSAHQLOQLRETMHNMOLEVDLLKAEN 

ACCGACT3AAGGTAGCCCCAGGCCCCTCATCAGGCTCCACTCCAGGGCAGGTCCCTGGATCATCTGCATTATCTTCCCCACGCCGCTCCCTAGGCCTGGC 
rGGCTGACTTCCATCGGGGTCCGGGGAGTAGTCCGAGGTGAGGTCCCGTCCAGGGACCTAGTAGACGTAATAGAAGGGGTGCGGCGAGGGATCCGGACCG 



•insert pLMI 



-ORF pLMI 



ORLKVAPGPSSGS TPGOVPGSSALSSPRRSLGL A 
ACTCACCCATTCCTTCGGCCCCAGTCTTGCAGACACAGACCTGTCACCCATGGATGGCATCAGTACTTGTGGTCCAAAGGAGGAAGTGACCCTCCGGGT3 

— — ( — _ — , — 1 1 > 1 1 i 1 . ■ > 1 ' 1 i ' 1 1 1 1 1 1 i z&oc 

TGAGTGGGTAAGGAAGCCGGGGTCAGAACGTCTGTGTCTGGACAGTGGGT ACCTACCGTAGTCATGAACACCAGGTTTCCTCCTTCACTGGGAGGCCCAC 



•insert pLMI 



-ORF pLMI 



LTHSFGPSLADTDLSPMOGISTCGPKEEVTLRV 



BNSDOCID: <WO 982481 0A2 I > 



WO 98/24810 



120/270 



PCT/EP97/06956 



Tuesday. 18 November 1997 13:58 Paqe 4. 

pLM3 (1 > 10847) Site and Sequence _____ 

GTGGTGAGGATGCCCCCGCAGCACATCATCAAAGGGGACTTGAAGCAGCAGGA ATTCTTCCTGGGCTGTAGCAAGGTCAGTGGAAAAGTTGACTGG^OA 
C ACCACTCCTACGGGGGCGTCGTGTAGTAGTTTCCCC TGAAC TTCGTCGTCCTTAAG AAGGACCCGACATCGTTCCAGTCACC TTTTCAACTGACC TTCT 



"insert pLMl 



-ORF pLM1 



V V R H P P Q H I I K G 0 L K Q Q E F F L G C S K V S G K V D V K 

TGCTGGATGAAGCTGTTTTCCAAGTGTTCAAGGACTATATTTCTAAAATGGACCCAGCCTCTACCCTGGGACTAAGCACTGAGTCCATCCATGGCTACAG 
1 I 1 I 1 1 ' ' » 1 ' ' I 1 I ■ 1 ii| ; 1 f- ^ar-i 

ACGACCTACTTCGACAAAAGGTTCACAAGTTCCTGArATAAAGATTTTACCTGGG7CGGAGATGGGACCCTGATTCGTGACTCAGGTAGGfACCGATGTC 



- insert pLM1 



-ORF pLMl 



ML OEA VF QV.F KD Y 1S KM DPASTLGLSTES IHGYS 

CATCAGCCACGTGAAACGAGTGTTGGATGCAGAGCCCCCCGAGATGCCTCCTTGCCGTCGAGGTGTCAATAACATA TCAGTCTCCCTCAAAGGTCTGAAG 
GTAGTCGGTGCACTTTGCTCACAACCTACGTCTCGGGGGGCTCTACGGAGGAACGGCAGC TCCACAGTTATTG TATAGTCAGAGGGAGTTTCCAGACTTC 



-insert pLMl 



-ORF pLM1 



I S H V K R V L Q A E P P E M P P C RRGVNN ISVSLKGLK 

GAGAAATGCGTCGACAGCCTGGTGTTCGAGACGCTGATCCCCAAGCCGATGATGCAGCAC TACATAAGCCT CC TGC TG AAGC ACC GGCG CC TCG TC CTCT 
CTCTTTACGCAGCTGTCGGACCACAAGCTC TGCGACTAGGGGTTCGGC TACTACGTCGTGArGTATTCGGAGGACGACTTCG TGGCCGCGGAGCAG3AGA 



-insert pLMl 



-ORF pLM1 



E K C V 0 S L V F E T L I P K P M M Q H Y ISLLLKHRRLVL 

CGGGCCCCAGCGGCACGGGCAAGACCTACCTGACCAATCGCTTGGCCGAGTACCTGGTGGAGC GC TC TGGCCG TGAGGTC AC AGAGGGC ATCGTCA3C £Z 
GCCCGGGGTCGCCGTGCCCGTTCTGGATGGACTGGTTAGCGAACCGGCTCATGGACCACCTCGCGAGACCGGCACTCCAGTGTCTCCCGTAGCAGT'JGTo 



-insert pLM1 



-ORF pLMl 



SG p SGTG KTY LTNRL AE Y LVERSGREVTEG I VST 

CTTCAAC ATGCACC ^GCAGTCTTGCAAGGATCTGCAACTGTA TCTTTCCAACCTAGCCAACC AGATAGACCGGGAAAC AGGAATTGGGGATG TGCCCCTG 
GAAGTTGTACGTGGTCGTCAGAACGTTCCTAGACGTTGACATAGAAAGGTT3GATCGGTTGGTCTATCTGGCCCTTTGTCCTTAACCCCTACACGG3GAC 



- insert pLMl 



" " ORFpLMI 

F N M . H Q Q S CKOLOL YLSMLAMQ I0RETG I GOV PL 



BNSDOCID: <WO 982481 0A2J_> 



• 

WO 98/24810 



121/270 



PCTYEP97/06956 



■ Tuesday, 18 November 1997 13:58 Page A 

pLM3 (1 > 10847) Site and Sequence 

GTGATTC TATTGGATGACCTGAG TGAAGCA GGCTCCATC AGTGAGTTGGTCAATGGGGCCCTCACCTGCAAGTATC AT AAATGTCCCTAT ATTALA 3GTA 
CACTAAGATAACCTACTGGACTCACTTCGTCCGAGGTAGrCACTCAACCAGTTACCCCGGGAGTGGACGTrCAT AGTATTTACAGGGATATAATATCCAT 

— insert pLMl — 



■ — — ORFpLMl ZZ 

V I LLDDLSEAGS t SELVNGAL TCKYHKCPY I I q 

CCACCAATCAGCCTGTAAAAATGACACCCAACCATGGCTTGCACTTGAGCTTCAGGATGTTGACCTTCTCCAACAACGTGGAGCCAGCCAATGGCTTCrT 

. 1 ' 1 1 1 1 1 1 1 ' ' 1 ' ' 1 1 1 — 1 ■ ^- SUCC 

GGTGGTTAGTCGGACATTTTTAC TGTGG6TTGGTACCGAACGTGAACTCGAAGTCCTACAAC TGGAAGAGGTTGTTGCACCTCGGTC6GTTACCGAAGGA 

insert pLMl 

ORFpLMl — IZII 
TTNQP VKMTPNHGLHL SFRML TFSNNVEPANGFL 



GGTTCGTTACCTGAGGAGGAAGCTGGTA GAGTCAGACAGCGACATCAATGCCAACAAGGAAGAGCTGCTTCGGGTGCTCGACTGGGTACCCAAGCT3TGG 

' ' i''' i i i * ■ ■ ■ ■ i ■ ■'■ t ■ ■ ' fri. ■ ■ i i. i i t .i ■ | , t . n i i , — ^ ^ 

CCAAGCAATGGACTCCTCCTTCGACCATCTCAGTCTGTCGCTGTAGTTACGGTTGTTCCTTCTCGACGAAGCCCACGAGCTGACCCATGGGTTCGACACC " 



— insert pLM1 

QRF pLMl 

VRYLRRKLVESDSDINANKEELLRVLDVVPKLV 

■ ■ ■ * 1 1 1 ■ 1 ■ ■ ----- - - * - ■ ■ ■ ■ . . . 

TATCATCTCCACACCTTCCTTGA GAAGCACAGCACCTCAGACTTCCTCATCGGCCCTTGCTTCTTTCTGTCGTGTCCCATTGGCATTGAGGACTTCCGGA 

' 1 ' 1 ' ' 1 1 ' 1 1 ' » i » 1 I ' I i ■ — i ■ | 

ATAGTAGAGGTGTGGAAGGAACTCTTCGTGTCGTGGAGTCTGAAGGAGTAGCCGGGAACGAAGAAAGACAGC ACAGGGTAACCGTAACTCCTGAAGGCCT 

insert pLMl — 



— ORFpLMl 

Y H L H T F L E K H S T S P F L 1 G PCFFLSCP [G IEDFR 

CCTGGTTCmTTGACCTGTGGAACAACTC TATCATTCCCTATCrACAGGAAGGAGCCAAGGATGG GATAAAGGTCCATGGACAGAAAGCTGCTTGGGAGGA 
GGACCAAGTAAC TGGAC ACCTTGTTGAGATAGTAAGGGATAGATGTCC TTCCTCGGTTCCTACCCTATTTCCAGGTACC TGTCTTTCGACGAACCC TCC T ° 

insert pLMl 

ORFpLMl ~ 

T V F I 0 L V N N S I i P Y LQEGAKOG IKVHGQKAAVED 

CCCAGTGGAATGGGTCCGGGAC ACACTTCCCTGGCCATCAGCCCAACAAGACCAATCAAAGCTGTACCACCTGCCCCCACCCACCGTGGGCCCTCACAGC 

11 : 11 * * 'ti. ( i i ii i i i i i ■ -t i i i i i i | » j j ^gr 

GGGTCACCTTACCCAGGCCCTGTGTGAAGGGACCGGTAGTCGGGTTGTTCTGGTTAGTTTCGACATGGTG GACGGGGGTGGGTGGCACCCGGGAGT5TCG 

insert pLMl 



— ~ " ORFpLMl — 

PV E VVRDTLPVP3AQ0DGSKLYMLPPPTVGPH 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 122/270 PCT7EP97/06956 



Tuesday, 18 November 1997 13:58 p age ^ 

pLM3 (1 > 10847) Site and Sequence ^ 

ATTGCCTCACCTCCCGAGGATAGGACAGTCAAAGACAGCACCCCAAGTTCTCTGGACTCAGATCCTCTGA TGGCCA TGCTGC TGAAACTTC 4AGAAGC TG 
TAACGGAGTGGAGGGCTCCTATCCTGTCAGTTTCTGTCGTGGGGTTCAAGAGACCTGAGTCrAGGAGACTACCGGTACGACGACTTTGAAoTTCTTCGA: 



■insert pLM1 



-ORF pLMl 



1 A S P P E D R T V K D S T P S S L D S DPLMAMLLKLOEA 

CCAACTACATTGAGTCTCCAGATCGAGAAACCATCC TGGACCCCAACCTTCAGGCAACACTTTAAGGGTTCGGCAATCAC TGTCACCCCCGGAC AGCAGA 
GGTTGATGTAACTCAGAGGTCTAGCTCTTTGGTAGGACCTGGGGTTGGAAGTCCGTTGTGAAATTCCCAAGCCGTTAGTGACAGTGGGGGCCTGTCGTC" 



-insert pLMl 



-ORFpLMI I 



ANYIESPORETILDPNLQATL.GFGNHCHPRTAE 

1 ■ ... i . . . . . . . . 

ACGCTGGCATCAGCTATCTTAGCTCCTCCTCTCCCCTCT CCTCTTTCAGAGCACTGGCTCTCCAGCCCCAGGAGGAGAACAGGAGGGAGGAGGAGATGAA 

t ■■■■»■■■■ 1 ■ I 1 t ■ ■ ■ I i . i i . i. .4 I i ■ i ■ . t * . ■ ■ | ■ ■ ■ t ■ ■ | i i i ( 5 1 q,-- 

TGCGACCGTAGTCGATAGAATCGAGGAGGAGAGGGGAGAGGAGAAAGTCTCGTGACCGAGAGGTCGGGGTCCTCCTCTTGTCCTCCCTCCTCCTCTACTT 



-insert pLMl 



R v H Q L S . L L L S P L L F Q S T G S PAPGGEOEGGGDE 

AGAGGAGGGACAGGTTCTTGGTGCTGTACCTTTGAGAACTTCCTAGGAAGGAAT GGTGGGGTGGCGTTTGGGAACTTGTGCCCCCTAAACACATTTACTc 
TCTCCTCCCTGTCCAAGAACCACGACATGGAAACTCTTGAAGGATCCTTCCTTACCACCCCACCGCAAACCCTTGAACACGGGGGATTTGTG7AAATGA- 



•insert pLMl 



3GGTGSVCCTFENFLGRNGGVAFGNLCPLN7FT 

— * „ ■ ,., . fa.,. . M . . I ■ ■ . I ■ ■ . . - 1 ... ■ 

GCCTCCTCTAATGACTTTGGGGAAAAGATGATTCTGGGTCTTTCCCTTGACTTCTTGTTTCAATTACAA AC TCCTGGGCTTTCTGGGGACGGGTTCAGAm 
CGGAGGAGATTACTGAAACCCCTTTTCTACTAAGACCCAGAAAGGGAACTGAAGAACAAAGTTAATGTTTGAGGACCCGAAAGACCCCTCCCCAAG TCJ~ 



-insert pLM1 



G L L ■ L W G K D O S G S FP . LLVS I T N S V A F V G G '/ Q K 

AAC ATCAAAACACTGC AGCAGTTCCCCGGAATTCAGCTTGGACTTAAC CAGGCTGAACTTGCTCAAAAGAAGCCGAATTCCAGCACACTGGCGGCCGTTA 
TrGTAGTTTTGTGACGTCGTCAAGGGGCCTTAAGTCGAACCTGAArTGGTCCGACTTGAACGAGTTTTCTTCGGCTTAAGGTCGTGTGACCGCCGGCAA" 



■insert pLMl 



TS KHCSSSPEFSLDCTRLNLLKR 



SR I P A H V 3 P L 



CTAGTTCTAGAGGGCCCGTTTAAACCCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCC AGC CATCTGTTGTTTGCCCCTCCCCCGTGCC 7TCCTTG 
GATCAAGATCTCCCGGGCAAATTTGGGCGACTAGTCGGAGCTGACACGGAAGATCAACGGTCGGTAGACAACAAACGGGGAGGGGGCACGGAAoGAArT^ °" 

L VL EGPFKP AO QPRLCLLVASHLLFAPPPCLP 

CC TGGAAGGTGCCACTCCCACTGTCC TTTCCTAATAAAATGAGGAAATTGCATCGCATTGTC TGAGTAGG FGTCA T TC TATTC TGGGGGGT GG3 oTGGG'J 
GGACC TTCCACGGTGAGGG TGAC AGGAAAGGATTATTTT AC TCCTTTAACG TAGCGTAACAG AC TC ATCCACAGTAAGATAAGACCCCCCACCCCACCC^ 
p w K v p L P L S F PN K MR K L H R I V V G V 1 L F V G 7 2 * G 



BNSDOCID: <WO 982481 0A2 I > 



# 



WO 98/24810 123/270 PCT/EP97/069S6 



Tuesday. 18 November 1997 13:58 

PLM3 (1 > 10847) Site and Sequence 1 ' 



CAGGACAGCAAGGGGGAGGATTGGGAAGACAATAGCAGGCATGCTGGGGATGCGGTGGGCTCTATGGCTTCTGAGGCGGAAAG/U CCAGCTGGGGC TCTA 

gtcctgtcgttccccctcctaacccttctgttatcgtccgtacgacccctacgccacccgagataccgaagactccgccIttctIggtcgaccccgagaI 67CC 

R T A R G R 1 G K T I AGML G MRUAL VLLRRKEP A G A L 
GGGGGTATCCCCACGCGCCCTGTAGCGGCGCATTAA6CGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCC AGCGCCCTAGCGCCCGC 

cccccataggggtgcgcgggacatcgccgcgtaattcgcgccgcccacaccaccaatgcgcgtcgcactggcgatgtgaacggtcgcgggatcgcgggcg MK 



GG IPTRPVAAH. ARRVWWLRAA 



PLHLPAP.Rp 



TCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCTAAATCGGGGCATCCCTTTAGGGTTCCGATTTAGTGC TTTA 

aggaaagcgaaagaagggaaggaaagagcggtgcaagcggccgaaaggggcagttcgagatttagccccgtagggaaatcccaaggctaaatcacgaaaI 59C< 
L L s L s s L . p F s p R 5 p A F p v « t ■ IGASL.GSDLVLV 



CGGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGTGGGCCATCGCCC 



TGATAGACGGTTTTTCGCCrTTTKArKTr/rc Arrr^ */■ 



GCCGTGGAGCTGGGGTTTTTTGAACTAATCCCACTACCAAGTGCATCACCCGGTAGCGGGACTATCTGCCAAAAAGCGGGAAACTGCAACCTCAGGTGCA ^ 
G T S . T P * L ' " V M V H V V G H R P D R R F F A L R V S P R 

TCTTTAATAGTGG ACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGATTTTGGGGAT TTCGGCCTATTG 

agaaattatcacctgagaacaaggtttgaccttgttgtgagttgggatagagccagataagaaaIctaaatattccctaaaacccctaaagccggataac 7,0C 

S L 1 V 0 S C S K L E Q H S T L S R S I L L I VKGFWGFR P , 
GTTAAAAAATGAGCT GATTTAACAAAAATTTAACGCGAATTAATTCTGT6GAATGTGTGTCAGTTAGGGTGTG6AAAGTCCC CAGGCTCCCCAGGCAGGC 

caattttttactcgactaaattgtttttaaattgcgcttaattaagacaccttacacacagtcaatcccacacctttcaggggtccgaggggtccgtccg ™- 

SVRVVKVPRLPRQA 



K MS.F NKNLTRINSVECV 

| STGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATT 



AGAAGTATGCAAAGCATGCATCTCAATTAGTCAGCAACCAGGI 



3TTGGTCCACACCTTTCAGGGGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAA 
E V C - K A C 1 S ' S ° ° P .6 V E S P Q APQOAEVCKACISI 



AGTCAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGUCCAGTTCCGCCCATTCTCCGCCCCATGGCTGACTAATTTTTTTTAT 



TCAGTCGTTGGTATCAGGGCGGGGATTGAGGCGGGTAGGGCGGGGATTGAGGCGGGTC 



"" AAGGCGGGTAAGAGGCGGGGTACCGACTGATTAAAAAAAATA 



^ *^ *^ P SRP . LRPS RP . LRPVPP I LRPHAD . FFL 

TTATGCAGAGGCCGAGGCCGCCTCTGCCTCTGAG CTATTCCAGAAGTAGT6AGGAGGCTTTTTTGGAGGCC TAGGCTTT TGCAAAAAGC TCCCGGGAGCT 

AATACGTCrCCGGCTCCGGCGGAGACGGAGACTCGATAAGGTCTTCATCACrceicCGA^AAACCTCC GGATCCGAAA^GTTnTCGAGGGCCCTCGA '** 
F " ° " P " P - P L " L S * ? » S S E E A F L E A . A F A K S S R E L 

TGTAf ATCCATTTTCGGATCTGATCAAGAGACAG GATGAGGATCGTTTCGCATGATTGAACAAGATGGATTGCACGCAG GTTCTCCGGCCGCTTnGKTf;" 

acatataggtaaaagcctagactagttctctgtcctactcctagcaaagcgtactaacttgttctacctaacgtgcgtccaagaggccggcgaacccac"" 760C 

V ' ^ F S ° > 1 K ft Q D D R F A . L N K M 0 C T Q V L R p L G w 

agaggctattcggctatgactgggcacaacag acaatcggctgctctgatgccgccgtgttccggctgtcagcgcagggg cgcccggttctttttgtcaa 
tctccgataagccgatactgacccgtgttgtctgttagccgacgagactacggcggcacmggccgaca gtcgcgtccccgcgggccaagaaaaacagt; 7?a 

" 6 * S A H T G H N R Q S A A l M P P C S G C 0 R R g A R F F L S 



GACCGACCTGTCCGGTGCCCTGAATGAACTGCAG GACGAGGCAGCGCGGCTATCGTGGCTGGCCACGACG-3GCG7TCC TTGCGCAfirTnT^<"Trr.Arf:TT 

CTGGCrGGACAGGCCACGGGACTTACTTGACGTCCTGCTCCGrCGCGCCGATAGCACCGACCGGTGCTGCCCGCAAGGA^GCGrCGACACGAGCTGCA: '** 

AFLAQLCSTL 



R P . T C P V p . MNCRTRQRGYRGVPrr 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 124/270 PCTVEP97/06956 



Tuesday. 18 November 1997 13:58 
pLM3 (1 > 10647) Site and Sequence 



Page iJL 



GrCACTGA.GCG G G,A G G G ACTG6CT G CTAT Ta GG CGAAGT 6 CCGGGGCA 5G .TCTCCTGTCATCTCACCTTGCTCCT GCC GA ., a ,.-. TJT .., T ..^.- 

CAGTGACTTCGCCCTTCCCTGACCGACGATAACCCGCTTCACGGCCCCGTCCTAGAGGAiAGTAGAGTGGAACGAGGACGGCTCTTTCArAGGTAGTAi ^ 
5 L K " E G T 6 C Y V A K C R ° " ' S C H L T L L L P p K Y p s w 

CTGATGCAATGCGGCGGCTGCAT ACGCTT G ATCCGGCTACCT G CCCATTCGACCACCAAGC G AAACATC G C ATCGAGCG AGCACGTArTrfififlTo: « 

GACTACGTTACGCCGCCGACGTAT G C G AACTA GG CCGAT G GACGGGTAAGCTGGTGGT TCGCTTTGTAGC G TAGCTCGCTCGTGCAT G AGCCTACCTTC G *** 
L " ° C G G C ' " L 1 " L P * " * T T K P N . A S S E H V L G y K 

CGGTCTTGTCGATCAG G AT G ATCTGGA CG A A G AGCATCAGG G GCTCGCGCCAGCCGAACT G TrCGCCA G GCTC AAGGCGCGCATGCCC G Ar C r.r,^^ T 

GCCAGAACAGCTAGTCCTACTAGACCTGCTTCTCGrAGTCCCCGAGCGCGGTCGGCTTG ^CAAGCGGTCCGAGTTCCGCGCGTACGGGCTGCCGCTCCTA 6 '* 
' V L S ' " " ' W T ' 5 ' * « » » 0 P N C S P G S P p A C P T A P , 

CrCGrCGTGACCCA TGG CGAT GCCrGCrTGC CGAATATC A T G GTGGAAAATG G CCGCTTTTCTGGATTCATCGAC T GTGGCCG GCT GGG r CTr ., r ,,,^ 

.AGCAGCACTGGGrACCGCTACGGACGAACGGCTTATAGlACCACCTTTTACCGGCGAAiAGACCTAAGTAGcrGACACCGGCCGACCC^ACCGCCTGG *« 
5 S •■ ' H » " " A C " ' 5 . " * « 1 A A F L D S S T V A G V „ „ p T 

G CTATCA G GACATAGCGTT G GCTACCC G TGATATTGCT GAA G AGCTTGGCGGCGAATGGGCTGACCGCTTCCTCG TGCTTTAC G GTATCGCCGCTrrrf;A 

cgata GT cctgtatc G caacc G atgggcactataacgacttctcgaaccgcc G cttacccg actggcga:gga G cacga:atgccatagcggcga GGG c; 30 " 

' ^ RWLPV IL LKSL A A N G L TASSCF TVSPLP 

TTCGCAGCGCATCGCCrrCTATCGCCTTCTTGACGAGT T CTTCTGAGCGGGACTCTGGGGTTCGAAATGACCGACCAAG CGACGCCCAACCTnrr^^, 

-aCGrCGCGTAGCGGAAGATAGCGGA.GAACTGCTCAAGAAGACTCGCCCTGAGACCCCAAGCTrTAclGGCTGGTTCGCTGCGGGTTGGACGGTAGTG W 
S ' S " 5 ' A F 1 T S S 5 E ■» ■> 5 « V P W 0 P P S P A Q p A , T 

GA G ATTTC G ATTCCACC G CCGCCrTCTAT G AAAG G TT GG GCTTCGGAATC G TTTTCCG G GACGCCGGCTGGAT G A TrrTrr A^rftrftftjT^ . 

C .CTAAAGCTAAGGTGGCGGCGGAAGATACTrTCCAACCCGAAGCCTrAGCAAAAGGCCCTGCGGCCGACCTACTAGGAGGrCGCGCCCCrAGAGTACG; 

" F " " R 1 L • K V G L R N »^ C » « L D 0 P P A P G S H A 

GGAGTTCTTCGCCCACCCCAACTTGTTTATTGCAGCTTAT AATGGTTACAAATAAAGCAATAGCATCACAAAT TTCA CAAATAAAGCATTTTTTTC Af Ts 

C .CAAGAAGCGGGTGG^rTGAACAAATAACGTCGAATATTACCAArGlTTATTTCGTTATC GTAGTGTTTAAAGTGTTTATTTCGTAAAAAAAGrG^ "* 
J '^*"^* K Q -^HKFHK.S!FFT 

uma!a| 1 '^ 1 '^^^^^^^^ ^ G ^ A ^^"^f ^^ ^ ^^^^^^^^^^ACCTCTAGCTAGAGCTTGGCGTAATC ATGGTCArAGCTGT 
GTAAGATCAACACCAAACAGGTTTGA G TA G TTACATA G AATA G TACAGACATATGGCAGCTGGAGATCGArCTCGAACCGCATTAGTACCAGTATCGA'"« ^ 
'*-^^V QTHQC I LSCLYTVDL .LELGV I M V t A V 

AAGGACACACTTTAACAATAGGCGAGTGTTAAGCTGTGTTGTATGCTCGGCCTTCGTATTTCACATTTCGGACCCCACGGATTACTCAC TCGATTGAGTG "« 
" " L 1 5 A " N S T °. " T S P K H K V , S L G C L M S E L T H 

T ATrAACGCAACGCGA G TGACGGGC G AAAGGTCAGCCCTTTGGACAGC;CG G TCGACG;AATTACTTAGCCG G TTGC G CGCCCCrCTCC G CCAAACG f -A "« 
" *.MNRPTRGERRFA 

~~^*"~^^ TT ^ G ^^^^^ T *' A ^^^~ T ^^^ G ^ G ^^ GGT ^ G }^ GG ^^ G ^^"^^^^ ^ ^T^ ftG ^ T ^"^^ A ^AGGCGGTAA TACGGTTAT 
~ ^ CG ^AG G CGAA GG A G CGAG T GACTGA G CGACGC G AGCCA G CAA G CCGAC G CCGCTC G 7r A T :GrCGAGT G A C TTrCC G CCArrATGCCAAT; *** 
~~ F - L ^ ^ ^ G R S A A A S G I S 3 L K G G N T" V 1 



36CC 
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CCACAGAATCAGGGGATAACGCAGGAAAG AACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGGCCGCGTTG CTGGCGTTTTTCCATAG 
GGrGTCTTAGTCCCCTATTGCGTCCTTTCTTGTACACTCGTTTTCCGG ^S^TTTCCGGTCCTTGGC ATTfTTCCGGCGCAACGACCGCAAAAAGG TATC " C " 

KGRVAGVFP 



HR, »G »RKEHWSKRPAKGQEP. 



GCTCCGCCCCCCTGACGAGCATCACAAAAATC GACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATAC CAGGrr.TTTr.rr.r...^- 

CGAGGCGGGGGGACTGCTCGTAGTGTTTTTAGCTGCGAGTTCAGTCTCCACCGCTTTGGGCTGTCCT GATATTTCTATGGTCCGCAAAGGGGGACCTTCG 7 ^ 

G V S P V K 



APP . P • R^SOK STLKS EVAKP0RT1 K1P 



TCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGC TTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCGCTT TCTCAATnrrr Arrrr>;TA 

agggagcacgcgagaggacaaggctgggacggcgaatggcctatggacaggcggaaagagggaagccct Icgcaccgcg^aagagttacgagtgcgaca; 93<>: 



LP BALSC SOP AAYRIPVBLSPFGK 



RGAFSML TL 



GGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAG CTGGGCTG T GTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGC CTTATCC G G Ta ArT AT r. T , T 

ccatagagtcaagccacatccagcaagcgaggttcgacccgacacacgtgcttggggggcaagtcgggctggcgacgcggaataggccaItgamg 9aC,: 

L S S 



VS 0 F G V G R S L Q A G L C A R T P R S A R P U R L , „ 



TGAGTCCAACCCGGTAAGACACGACTTATCGCCACTG GCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATG TAGGCGGTGCTACAGAGTTrTT, 

ACTCAGGrTGGGCCATTCTGTGCTGAATAGCGGTGACCGTCGTCGGTGACCATTGTCCTiATCGTCTCGCTCCAlACATCCGCCACGATGTCTCAAGAAC ^ 
VQ PGKTRL I ATGSSHtf. QQ . q S E V C R R C Y R V L 



AAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTC6GAAA, 



AAGAG TTGG T AGCTCT TGAT 

ATAAACCATAGACGCGAGACGACTTCGGTCAATGGAAGCCTTTTTCTCAACCATCGAGAACTA 
K D S 1 V Y L R S A E A SYLRKKSW. LL I 



TTCACCACCGGATTGATGCCGATGTGATCTTCCTGTCATAAACCATAGACGCGAGACGACTTraRTr^T.^.^-^L '.[ J 

E V V A . LRLH 



CCGGCAAACAAACCACCGCTGGTAGCGGTGGTTT TTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGAT CTCAAGAAGATCCTTTGATrTTTTr 

GGCCGTTTGTTTGGTGGCGACCATCGCCACCAAAAAAAC AAACGTTCGTCGTCTAATGCGCGTCTTTTTTTCCTAGAGTTCTTCTAGGAAAC TAGAAAA'J 
" ° T N " " V - » V F F ' > ° ' A 0 Y A Q K K R , S „ R s F 0 L F 

■"ACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTT 



rTAAGGGATTTTGGTCATGAGATTATCAAAAAGGATC TTCACCTAG ATCCTTTTA 



AATTAAAAA 



A'GCCCC AGACTGCGAGTCACCTTGCTTTTGAGTGCAATTCCCTAAAACCAGTACTCTAATAGTTTTTCCTAGAAGTGGATC rAGGAAAATTTAATTTTT *** 



G V - RSVEPKLTLRO 



F G H E 1 ' K »<OLHLDPFKLt; 



TGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATnr 



, , TTAATCAGTGAGG CACCTATCTCAGCGATCTGTCTATTTC 

ACrKAAAATTTAGTTAGATTTCArATArACTCATTrGAACCAGACTGTCAATGGTT ACGAATTAGTCAC^ 

Q . L P M > N 0 . G T Y L S 0 L S I 3 



" K F »NLKY!. VNLV 



G7TCATCCATAGTTGCC 



TGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACC 



t ^ . ATCTGGCCCCAG TGCTGCAATG ATACCGCGAGACCCACG 

" r ^ T ^'' r TATGGCGCTCTGGGTGC 
IWPQCCNOTARPT 



CAAGtAGGTATCAACGGACTGAGGGGCAGCACATCTATTGATGCTATGCCCTCCCGAATGGT AGACCGGGGTCACGACGTTACT ~ ~ ~ * 

F 1 H , S C L T P«RV0NY0TGGLT 



CTCACCGGCTCCAGATTTATCAGCAATAAACCAG CCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACT TTATCCGCCTCCATCC AG TCTATTAAT 
L G S R F ISNKPASRKGRAQKV 



SCNFIRLHR 



V Y 



T.TTGCC.GGAAGCTAGAG^^ 

M^AACGGCCCTICGAICTCArTCATCAAGCGGTCAATTArCAAACGCGTTGCAACAACGGTAACGATGTCCGTAGCACCACAGTGCGAGCAGCAAACCAT 

- FAQRCCHC Y R H R G V T U V V « V 
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accgaagtaagtcgaggccaagggttgctagttccgctcaatgtactagggggtacaacacgttttttcgccaatcgaggaagccaggaggctagcaaca '° 3t 

G / '■ ° L P F P T ' < A S Y M IPHVVQKSG 



L L R S S 0 B 



CAGAAGTAAGTTGGCCGCAGTGTTATCACTC 



rCATGGTTATGGCAGCACTGC ATAATTCTCTTACTGTCATGCCATCCGTA A^T^rTxr.^^^^^ 

gtcttcattcaaccggcgtcacaatagtgagtaccaataccgtcgtgacgtattaagagaatgacag tacggtaggcattctacgaaaagacactgacca * t '* tit 
VGRSVI ^ hgycsta.fsycha IftKNLFCDW 



Q K 



GAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGA GTTGCTCTTGCC 

V LNQV I LR I VY A A T E L L L P G V N 



4__ , - - ; w -^CGGCGTCAATACGGGA TAATACCGCGCCACATACrAfiAArTT 

4 ' ' 1 ' > ■ i05t 



T G 



Y R A T 



0 N F 



, T j > ^^-^-^l^ A ^^ A ^^^ < '^ AAA ''^^ l "^^ t ' < '^ G< ' ( '^ AA A At ' ^ T ^ A ^^ A ^^^ A ^^^ TGTTGAGATCCAG TTCGATGTAACCCACTC GTGCArcrAj 
A, TKAC AG AG -^CTTTTGCAAGAAGCCCCGC.TlTGAGAGTTC CrAGAATGGCGACAACTCTAGGTCAAGCr^ATTGGGTGAGCACGlGGGT; 
K 5 A H H V K T F F GAKTLKOLTAV 



EIQFOVTHS 



C T Q 



CTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGA GCAAAAACAGGAAGGCA AAATGCCGCAAAAAAGCRAA TAtcecrr. 

GACrAGAAGTCGTAGAAAATGAAAGTGGTCGCAAAGACCCACTCGTTTTlGrCCTTCCGlrT TACGGCGlTTTTTCCCTlATTCCCGCTGTGCCTTTAcI ' 0 * 
' ' ' ' F Y F " ° » F " V 5 ' » ' A K C RKKGNKGDTCM 



^^T^^L^^.^.^^^ ^^"^^"^^^^^"^y^ A ^ ^^ AA ^^ A ^ " ^T A T C AGG G TTATTGTC TCATGAGCGGATACA TATTTGAATG TATT TAGAA AAATAAAC 

acttatgagtatgagaaggaaaaagttataataacttcgtaaatagtcccaa taacagagtactcgcc^tgtataaacItac- 1031 

LNTHTLPFS 1LLKHLSG LLSHERIHI 



CATAAATCTTTTTATTTG 
M Y L E K . T 



AAA 



rAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACC TGACGTC 
TTTATCCCCAAGGCGCGTGTAAAGGGGCTTTTCACGGTGGACTGCAG 



108a7 



NRGSAHISP 



K S A T 
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Sequence of pTB72, an expression vector incorporatingC. elegans 
UNC-53. The Open reading frame (ORF) of the prolinker + Ce-UNC-53 and 
5 (upper ORF) Ce-UNC-53 alone (lower ORF) are listed under the sequence. 

modified from pb72.do patent by changing oh on circle goquonoo:Created: maandag, 6 JuQ 10960927" 

AATG G C C C GCC TGGC ATTATGCCC AGTAC ATGACCTTATGGGACTTTC CTACTTGGCAGTAC ATCTACGTATTAG 

TCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGATAGCGGT7TGACTCACGGGGA 

10 TTTCCAAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTC^ 

CGTAACAACTCCGCCCCATTGACGCAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTC 
TCTGGCTAACTAGAGAACCCACTGCTTACTGGCTTATCGAAATTAATACGACTCACTATAGGGAGACCCAAGCTT 
GGTACCGAGCTCGGATCCACTAGTAACGGCCGCCAGTGTGCTGGAATTCTGCAGATATCCATCACACTGGCGG 
CCGCCGCCATGACGACGTCAAATGTAGAATTGATACCAATCTACACGGATTGGGCCAATCGGCACCTTTCGAAG 

15 GGC AGCTTATC AAAGTC GATTAGGGATATTTCCAATGATTTTCGCGACTATCGACTGGTTTCTC AGCTTATTAAT 
G TG ATC GTTCCGATC AACGAATTCTC GC0TGC ATTG ACGAAACGT 

C C TCGAAACGTGTC TC G ACTAC CTGAAAAATCTGGGTCTCGACTG C TCGAAACTC ACC AAAACCC^^ 
GCGGAAACTTGGGTGCAGTTCTCCAGCTGCTCTTCCTGCTCTCCACCTACAAGCAGAAGCTTCGGCAACTGAAA 
AAAGATCAGAAGAAATTGGAGCAACTACCCACATCCATTATGCCACCCGCGGTTTCTAAATTACCCTCGCCACGT 
20 GTCGCCACGTCAGCAACCGCTTCAGCAACTAACCCAAATTCCAACTTTCCACAAATGTCAACATCCAGGCTTCAG 
ACTCCACAGTCAAGAATATCGAAAA7TGATTCATCAAAGATTGGTATCAAGCCAAAGACGTCTGGACTTAAACCA 
CCCTCATCATCAACCACTTCATCAAATAATACAAATTC^TTCCGTCCGTCGAGCCGTTCGAGTGGCAATAATAAT 
GTTGGCTCGACGATATCCACATCTGCGAAGAGCTTAGAATCATCATCAACGTACAGCTCTATrTCGAATCTAAAC 
CGACCTACCTCCCAACTCCAAAAACCTTCTAGACCACAAACCCAGCTAGTTCGTGTTGCTACAACTACAAAAATC 
25 GGAAGCTCAAAGCTAGCCGCTCCGAAAGCCGTGAGCACCCCAAAACTTGCTTCTGTGAAGACTA7TGGAGCAAA 
ACAAGAGCCCGATAACAGCGGTGGTGGTGGTGGTGGAATGCTGAAATTAAAGTTATTCAGTAGCAAAAACCCAT 
CTTCCTCATCGAATAGCCCACAACCTACGAGAAAGGCGGCGGCGGTGCCTCAACAACAAAC7TTGTCGAAAATC 
GCTGCCCCAGTGAAAAGTGGCCTGAAGCCGCCGACCAGTAAGCTGGGAAGTGCCACGTCTATGTCGAAGCTTT 
OA GTACGCCAAAAGTTTCCTACCGTAAAACGGACGCCCCAATCATATCTCAACAAGACTCGAAACGATGCTCAAAGA 
30 G C A GTG AAG AAG AG TC C GG ATAC G CTGGATTC AAC AGC ACG TC GCC AA C GTC ATC ATCG ACGGAAGGTTCCCT 
AAGCATGCATTCCACATCTTCCAAGAGTTCAACGTCAGACGAAAAGTCTCCGTCATCAGACGATCTTACTCTTAA 
CGCCTCCATCGTGACAGCTATCAGACAGCCGATAGCCGCAACACCGGT7TCTCCAAATATTATCAACAAGCCTG 
rTGAGGAAAAACCAACACTGGCAGTOAA^GGAGTGAAAAGCACAGCGAAAAAAGATCCACCTCCAGCTGTTCCG 
CCACGTGACACCCAGCCAACAATCGGAGTTGTTAGTCCAATTATGGCACATAAGAAGTTGACAAATGACCCCGT 
35 GATATCTGAAAAACCAGAACCTGAAAAGCTCCAATCAATGAGCATCGACACGACGGACGTTCCACCGCTTCCAC 
CTCTAAAATCAGTTGTTCCACTTAAAATGACTTCAATCCGACAACCACCAACGTACGATGTTCTTCTAAAACAAGG 
AAAAATCACATCGCCTGTCAAGTCGTTTGGATATGAGCAGTCGTCCGCGTCTGAAGACTCCATTGTGGCTCATG 
CGTCGGCTCAGGTGACTCCGCCGACAAAAACTTCTGGTAATCATTCGCTGGAGAGAAGGATGGGAAAGAATAAG 
ACATCAGAATCCAGCGGCTACACCTCTGACGCCGGTGTTGCGATGTGCGCCAAAATGAGGGAGAAGCTGAAAG 
40 AATACGATGACATGACTCGTCGAGCACAGAACGGCTATCCTGACAACTTCGAAGACAGTTCCTCCTTGTCGTCT 
GGAATATCCGATAACAACGAGCTCGACGACATATCCACGGACGATTTGTCCGGAGTAGACATGGCAACAGTCGC 
CTCCAAACATAGCGACTATTCCCACTTTGTTCGCCATCCCACGTCTTCTTCCTCAAAGCCCCGAGTCCCCAGTC 
GGTCCTCCACATCAGTCGATTCTCGATCTCGAGCAGAACAGGAGAATGTGTACAAACTTCTGTCCCAGTGCCGA 
ACGAGCCAACGTGGCGCCGCTGCCACCTCAACCTTCGGACAACATTCGCTAAGATCCCCGGGATACTCATCCTA 
45 TTCTCCACACTTATCAGTGTCAGCTGATAAGGACACAATGTCTATGCACTCACAGACTAGTCGACGACCTTCTTC 
ACAAAAACCAAGCTATTCAGGCCAATTTCATTCACTTGATCGTAAATGCCACCTTCAAGAGTTCACATCCACCGA 
GCACAGAATGGCGGCTCTCTTGAGCCCGAGACGGGTGCCGAACTCGATGTCGAAATATGATTCTTCAGGATCC 
TACTCGGCGCGTTCCCGAGGTGGAAGCTCTACTGGTATCTATGGAGAGACGTTCCAACTGCACAGACTATCCGA 
TGAAAAATCCCCCGCACATTCTGCCAAAAGTGAGATGGGATCCCAACTATCACTGGCTAGCACGACAGCATATG 
5U GATCTCTCAATGAGAAGTACGAACATGCTATTCGGGACATGGCACGTGACTTGGAGTGTTACAAGAACACTGTC 
GACTCACTAACCAAGAAACAGGAGAACTATGGAGCATTGTTTGATCTTTTTGAGCAAAAGCTTAGAAAACTCACT 
CAACACATTGATCGATCCAACTTGAAGCCTGAAGAGGCAATACGATTCAGGCAGGACATTGCTCATTTGAGGGA 
TATTAGCAATCATCTTGCATCCAACTCAGCTCATGCTAACGAAGGCGCTGGTGAGCTTCTTCGTCAACCATCTCT 
GGAATCAGTTGCATCCCATCGATCATCGATGTCATCGTCGTCGAAAAGCAGCAAGCAGGAGAAGATCAGCTTGA 
55 GCTCGTTTGGCAAGAACAAGAAGAGCTGGATCCGCTCCTCACTCTCCAAGT7CACCAAGAAGAAGAACAAGAAC 
TACGACGAAGCACATATGCCATCAATTTCCGGATCTCAAGGAACTCTTGACAACATTGATGTGATTGAGTTGAAG 
CAAGAGCTCAAAGAACGCGATAGTGCACTTTACGAAGTCCGCCTTGACAATCTGGATCGTGCCCGCGAAGTTGA 
TGTTCTGAGGGAGACAGTGAACAAGTTGAAAACCGAGAACAAGCAATTAAAGAAAGAAGTGGACAAACTCACCA 
ACGGTCCAGCCACTCGTGCTTCTTCCCGCGCCTCAATTCCAGTTATCTACGACGATGAGCATGTCTATGATGCA 
OU GCGTGTAGCAGTACATCAGCTAGTCAATCTTCGAAACGATCCTCTGGCTGCAACTCAATCAAGGTTACTGTAAAC 
GTGGACATCGCTGGAGAAATCAGTTCGATCGTTAACCCGGACAAAGAGATAATCGTAGGATATCTTGCCATGTC 
AACCAGTCAGTCATGCTGGAAAGACATTGATGTTTCTATTCTAGGACTATTTGAAGTCTACCTATCCAGAATTGAT 
GTGG AGCATCAAC TTCGAA TCGATGC TCGTGA 1 1 C I A I UC M feGC I ATCAAATTGGTGAACTTCGACGCGTCATT 

ggagactccacaaccatgataaccagccatccaactgacattcttacttcctcaactacaatccgaatgttcatg 
05 cacggtgccgcacagagtcgcgtagacagtctggtccttgatatgcttcttccaaagcaaatgattctccaact 
cgtcaagtcaattttgacagagagacgtctggtgttagctggagcaactggaattggaaagagcaaactggcga 
agaccctggctgcttatgtatctattcgaacaaatcaatccgaagatagtattgttaatatcagcattcctgaaaa 
caataaagaagaattgcttcaagtggaacgacgcctggaaaagatcttgagaagcaaagaatcatgcatcgtaat 

/v ggtccatttgtagtatgcacagtcaaccgatatcaaatccctgagcttcaaattcaccacaatttcaaaatgtca 
attgagaaaacgaattctgttgatgtgacagttggtccaagagcatgcttgaactgtcctctaactgtcgatgga 
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A^^ff£ C ^ mCCCAGA>CCTCCTC ^^ 



5AGGCTTTT7TGGA 
AGACAGGATGAGG 

ss^s AC6cnMTCCGacT ^ 



GACCCATGGCGATG 
iGGCTGGGTGTGGC 

S?-^ C ^ G ^ 



^* G ^^ G ^*^^^^^^*Q^CTCAAGGCGCGCATGCCCGACGGCGAGG*TCTCGTCGTGAC^;ATGGCGATa 



? A I? A ?^in CAC ^TAAAGCATTTTTTTCACTGCATTCTAGTTGTGGmGTCCAAACTCA 



£ AG f GGA ™ GCAGG *A*GAACATGTGAGCAAAAGa^^ 
G fo G ^ G JTiy CCATAGGCTCCGCCCCCCTGACGAGCA TCACAAAAATCGACGC^ 
A ^^^^ G ^^ AGG *^J*^** A '^^*^^*GQCGTTTCCCCCTGGAAGCTCCCTCGTGCC 
G f T ^V^ G - G ^ TA i :CTGTCCGCCmCTCCCncGG ^OCGTGGCGCmCTC 
TATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCC 
^GCCTTATCCGGTAACTATCGTCTTGACTCCAACCCGCT^ 



3GGAAGCGTGGCGCTTTCTCAATGCTCACGCTG 
3GCTGTGTGCACGAACCCCCCGTTCAGCCCGAC 

:ggtaagacacgacttatcgccactggcagcag< 

a £ t c c tag . aa c ^ a ^^ 

^GGTCATGA^™ 



gcagca£gca^aa^ 

TCAncTGAGAA^ 
Sc^Sa^ 



G^C^IAGTA^^^^ 
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Met Thr Thr Ser Asn Val Glu Leu He Pro He 
Tyr Thr Asp Trp 15 

Ala Asn Arg His Leu Ser Lys Gly Ser Leu Ser 
Lys Ser lie Arg 30 

Asp He Ser Asn Asp Phe Arg Asp Tyr Ara Leu 
Val Ser Gin Leu 45 

He Asn Val He Val Pro He Asn Glu Phe Ser 
Pro Ala Phe Thr 60 

Lys Arg Leu Ala Lys He Thr Ser Asn Leu Asp 
Gly Leu Glu Thr 75 

Cys Leu Asp Tyr Leu Lys Asn Leu Gly Leu Asp 
Cys Ser lys Leu 90 

Thr Lys Thr Asp lie Asp Ser Gly Asn Leu Gly 
Ala Val Leu Gin 105 

Leu Leu Phe Leu Leu Ser Thr Tyr Lys Gin Lys 
Leu Arg Gin Leu 120 

Lys Lys Asp Gin Lys Lys Leu Glu Gin Leu Pro 
Thr Ser He Met 135 

Pro Pro Ala Val Ser Lys Leu Pro Ser Pro Arg 
Val Ala Thr Ser 150 

Ala Thr Ala Ser Ala Thr Asn Pro Asn Ser Asn 
Phe Pro Gin Met 165 

Ser Thr Ser Arg Leu Gin Thr Pro Gin Ser Arg 
lie Ser Lys He 180 

Asp Ser Ser Lys He Gly He Lys Pro Lys Thr 
Ser Gly Leu Lys 195 

Pro Pro Ser Ser Ser Thr Thr Ser Ser *sn Asn 
Thr Asn Ser Phe 210 

Arg Pro Ser Ser Arg Ser Ser Gly Asn Asn Asn 
Val Gly Ser Thr 225 

He Ser Thr Ser Ala Lys Ser Leu Glu Ser Ser 
Ser Thr Tyr Ser 240 
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Ser lie Ser Asn Leu Asn Arg Pro Thr Ser Gin 
Leu Gin Lys Pro 255 

Ser Arg Pro Gin Thr Gin Leu Val Arg Val Ala 
Thr Thr Thr Lys 270 

He Gly Ser Ser Lys Leu Ala Ala Pro Lys Ala 
Val Ser Thr Pro 285 

Lys Leu Ala Ser Val Lys Thr He Gly Ala Lys 
Gin Glu Pro Asp 300 

Asn Ser Gly Gly Gly Gly Gly Gly Met Leu Lys 
Leu Lys Leu Phe 315 

Ser Ser Lys Asn Pro Ser Ser Ser Ser Asn Ser 
Pro Gin Pro Thr 330 

Arg Lys Ala Ala Ala Val Pro Gin Gin Gin Thr 
Leu Ser Lys He 345 

Ala Ala Pro Val Lys Ser Gly Leu Lys Pro P-o 
Thr Ser Lys Leu 360 

Gly Ser Ala Thr Ser Met Ser Lys Leu Cys Thr 
Pro Lys Val Ser 375 

Tyr Arg Lys Thr Asp Ala Pro He lie Ser Gin 
Gin Asp Ser Lys 390 

Arg Cys Ser Lys Ser Ser Glu Glu Glu Ser Gly 
Tyr Ala Gly Phe 405 

Asn Ser Thr Ser Pro Thr Ser Ser Ser Thr Glu 
Gly Ser Leu Ser 420 

Met His Ser Thr Ser Ser Lys Ser Ser Thr Ser 
Asp Glu Lys Ser 435 

Pro Ser Ser Asp Asp Leu Thr Leu Asn Ala Ser 
He Val Thr Ala 4S0 

He Arg Gin Pro He Ala Ala Thr Pro Val Ser 
Pro Asn He lie 465 

Asn Lys Pro Val Glu Glu Lys Pro Thr Leu Ala 
Val Lys Gly Val 480 

Lys Ser Thr Ala Lys Lys Asp Pro Pro Pro Ala 
Val Pro Pro Arg 495 

Asp Thr Gin Pro Thr He Gly Val Val Ser Pro 
He Met Ala His 510 
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Lys Lys Leu Thr Asn Asp Pro Val lie Ser Glu 
Lys Pro Glu Pro 525 

Glu Lys Leu Gin Ser Met Ser lie Asp Thr Thr 
Asp Val Pro Pro 540 

Leu Pro Pro Leu Lys Ser Val Val Pro Leu Lys 
Met Thr Ser lie 555 

Arg Gin Pro Pro Thr Tyr Asp Val Leu Leu Lvs 
Gin Gly Lys lie 570 

Thr Ser Pro Val Lys Ser Phe Gly Tyr Glu Glr 
Ser Ser Ala Ser 585 

Glu Asp Ser lie Val Ala His Ala Ser Ala Gin 
Val Thr Pro Pro 600 

Thr Lys Thr Ser Gly Asn His Ser Leu Glu Aro 
Arg Met Gly Lys 615 

Asn Lys Thr Ser Glu Ser Ser Gly Tyr Thr Se*- 
Asp Ala Gly Val 630 

Ala Met Cys Ala Lys Met Arg Glu Lvs Leu Lys 
Glu Tyr Asp Asp 645 

Met Thr Arg Arg Ala Gin Asn Gly Tyr Pro Asd 
Asn Phe Glu Asp 660 

Ser Ser Ser Leu Ser Ser Gly lie Ser Asp Asn 
Asn Glu Leu Asp 675 

Asp He Ser Thr Asp Asp Leu Ser Glv Val Asd 
Met Ala Thr Val 690 

Ala Ser Lys His Ser Asp Tyr Ser His Phe Val 
Arg His Pro Thr 70S 

Ser Ser Ser Ser Lys Pro Arg Val Pro Ser Arg 
Ser Ser Thr Ser 720 

Val Asp Ser Arg Ser Arg Ala Glu Gin Glu As- 
Val Tyr Lys leu 735 

Leu Ser Gin Cys Arg Thr Ser Gin Arg Glv Ale 
Ala Ala Thr Ser 750 

Thr Phe Gly Gin His Ser Leu Arg Ser Pro Glv 
Tyr Ser Ser Tyr 765 

Ser Pro His Leu Ser Val Ser Ala Asp Lys Asp 
Thr Met Ser Met 780 



BNSDOCID: <WO 982481 0A2 J_> 



* 



WO 98/24810 PCT/EP97/06956 

132/270 



His Ser Gin Thr Ser Arg Arg Pro Ser Ser Gin 
Lys Pro Ser Tyr 795 

Ser Gly Gin Phe His Ser Leu Asp Arg Lvs Cvs 
His Leu Gin Glu 810 

Phe Thr Ser Thr Glu His Arg Met Ala Ala Leu 
Leu Ser Pro Arg 825 

Arg Val Pro Asn Ser Met Ser Lys Tvr Asp Ser 
Ser Gly Ser Tyr 840 

Ser Ala Arg Ser Arg Gly Gly Ser Ser Thr Giv 
lie Tyr Gly Glu 855 

Thr Phe Gin Leu His Arg Leu Ser Asd Glu Lvs 
Ser Pro Ala His 870 

Ser Ala Lys Ser Glu Met Gly Ser G'n Leu ^e*- 
Leu Ala Ser Thr 885 

Thr Ala Tyr Gly Ser Leu Asn Glu lvs Tyr Glu 
His Ala He Arg 900 

Asp Met Ala Arg Asp Leu Glu Cys Tvr Lvs ^sn 
Thr Val Asp Ser 915 " 

Leu Thr Lys Lys Gin Glu Asn Tvr Gly Ala Leu 
Phe Asp Leu Phe 930 

Glu Gin Lys Leu Arg Lys Leu Thr Gin His lie 
Asp Arg Ser Asn 94 5 

Leu Lys Pro Glu Glu Ala lie Arg ?he Arc Gin 
Asp lie Ala His 960 

Leu Arg Asp He Ser Asn His Leu Ala Se- *sn 
Ser Ala His Ala 975 

Asn Glu Gly Ala Gly Glu Leu Leu Arg Glr. Pro 
Ser Leu Glu Ser 990 

Val Ala Ser His Arg Ser Ser Met Ser Ser Se*- 
Ser Lys Ser Ser 1005 

Lys Gin Glu Lys lie Ser Leu Ser Ser Phe Glv 
Lys Asn Lys Lys 1020 

Ser Trp lie Arg Ser Ser Leu Ser Lvs Phe Thr 
Lys Lys Lys Asn 1035 

Lys Asn Tyr Asp Glu Ala His Met Fro Ser He 

BNSDOCID: <WO_9824810A2_I_> 



# 



# 



WO 98/24810 



133/270 



PCT/EP97/06956 



Gly Thr Leu Asp Asn lie Asp Val lie Glu Leu 
Lys Gin Glu Leu 1065 

Lys Glu Arg Asp Ser Ala Leu Tyr Glu Val Arg 
Leu Asp Asn Leu 1080 

Asp Arg Ala Arg Glu Val Asp Val Leu Arg Glu 
Thr Val Asn Lys 1095 

Leu Lys Thr Glu Asn Lys Gin Leu Lys Lys Glu 
Val Asp Lys Leu 1110 

Thr Asn Gly Pro Ala Thr Arg Ala Ser Ser Arg 
Ala Ser lie Pro 1125 

Val He Tyr Asp Asp Glu His Val Tvr Asd Ala 
Ala Cys Ser Ser 1140 

Thr Ser Ala Ser Gin Ser Ser Lys Arg Ser Ser 
Gly Cys Asn Ser 1155 

He Lys Val Thr Val Asn Val Asp lie Ala Gly 
Glu lie Ser Ser 1170 

He Val Asn Pro Asp Lys Glu He He Val Gly 
Tyr Leu Ala Met 1185 

Ser Thr Ser Gin Ser Cys Trp Lys Aso He Asp 
Val Ser He Leu 1200 

Gly Leu Phe Glu Val Tyr Leu Ser Arg He Asp 
Val Glu His Gin 1215 

Leu Gly He Asp Ala Arg Asp Ser He Leu Gly 
Tyr Gin He Gly 1230 

Glu Leu Arg Arg Val He Gly Asp Ser Thr Thr 
Met He Thr Ser 1245 

His Pro Thr Asp He Leu Thr Ser Ser Thr Thr 
He Arg Met Phe 1260 

Met His Gly Ala Ala Gin Ser Arg Val Aso Ser 
Leu Val Leu Asp 1275 

Met Leu Leu Pro Lys Gin Met lie Leu Glr. Leu 
Val Lys Ser He 1290 

Leu Thr Glu Arg Arg Leu Val Leu Ala Glv Ala 
Thr Gly He Gly 1305 

Lys Ser Lys Leu Ala Lys Thr Leu Ala Ala Tyr 
Val Ser lie Arg 1320 



BNSDOCID: <WO 982481 0A2_I_> 




WO 98/24810 



134/270 



PCT/EP97/06956 



Thr Asn Gin Ser Glu Asp Ser He Val Asn lie 
Ser He Pro Glu 1335 

Asn Asn Lys Glu Glu Leu Leu Gin Val Glu Arg 
Arg Leu Glu Lys 1350 

He Leu Arg Ser Lys Glu Ser Cys He Val He 
Leu Asp Asn He 1365 

Pro Lys Asn Arg He Ala Phe Val Val Ser Val 
Phe Ala Asn Val 1380 

Pro Leu Gin Asn Asn Glu Gly Pro Phe Val Val 
Cys Thr Val Asn 1395 

Arg Tyr Gin He Pro Glu Leu Gin lie His His 



Ser Val Met Ser Asn Arg Leu Glu Gly Phe He 
Leu Arg Tyr Leu 1425 

Arg Arg Arg Ala Val Glu Asp Glu Tyr Arg Leu 
Thr Val Gin Met 1440 

Pro Ser Glu Leu Phe Lys He He Asp Phe Phe 
Pro He Ala Leu 1455 

Gin Ala Val Asn Asn Phe He Glu Lys Thr Asn 
Ser Val Asp Val 1470 

Thr Val Gly Pro Arg Ala Cys Leu Asn Cys Pro 
Leu Thr Val Asp 1485 

Gly Ser Arg Glu Trp Phe He Arg Leu Trp Asn 
Glu Asn Phe lie 1500 

Pro Tyr Leu Glu Arg Val Ala Arg Asp Gly Lys 
Lys Thr Phe Gly 1515 

Arg Cys Thr Ser Phe Glu Asp Pro Thr Asp He 
Vai Ser Lys Lys 1530 

Trp Pro Trp Phe Asp Gly Glu Asn Pro Glu Asn 
Val Leu Lys Arg 1545 

Leu Gin Leu Gin Asp Leu Val Pro Ser Pro Ala 
hsn Ser Ser Arg 1560 

Gin His Phe Asn Pro Leu Glu Ser Leu lie Gin 
Leu His Ala Thr 1575 

Lys His Gin Thr He Asp Asn lie 



Asn Phe Lys Met 



1410 



BNSDOCID: <WO 982481 0A2_I_> 




WO 98/24810 135/270 PCT/EP97/06956 
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: tblasln search of the EST division of Genbank with the ORF of the 
longest known Ce-UNC-53 cDNA, tb3-M5, reveals two EST's with homology to a 
predicted coiled-coil region in Ce-UNC-53. 
5 1996?™ 1 - 4 - 9MP f 26 -M^ch-1996] [Build 14:27:13 Apr 1 

Reference: Altschul, Stephen F., Warren Clan. Webb Miliar r.,„„«. u 

sr:??-*;??^/- - lm - n a " o, • — ie ^?~"i^s°:. Si. 

Quory- tb3 H5 ORF 

(1553 letters) 

Database: Non-redundant Oat«bno of CenBank EST 01 vision 
€47.253 sequences; 234,216,804 total letters. 



35 



40 



5!? u f??;* producing High-scoring s«on»nt Pairs: 
dbJID357eoiCELX02SD6F C.elegans cDNA clone yk25d6 
db 3 |D33048lCELK025D6R C.elegans cONA clono Jk25d6 
OblH0903filM09D3fi ac7i -i u Z7"7 Z.. 



gb(H09036|H09036 
gblAA04 9124|AA04 9124 
gblR9147S|R91475 
gblT23446| T23446 
gb|R86390|R86390 
gb/T44781 IT44781 
gb|T75582lT75S82 



Reading High 
Frame Scor 

-1 



yl96cll.rl Homo sapiens cONA clo... *1 

si)46f04.rl Soaros mouse ombryo N... +3 

yq08cll.rl Homo sapiens cDNA clo... +2 

soq29S5 Homo sapiens cONA clono ... -1 

SW3ICA339SK Brugia malayi infect... +2 

8044 Arabidopsis thallana cDNA c. . . *1 

yd63fll.rl Homo saplons cONA clo... +2 



358 
177 



115 
106 
59 
61 
74 
71 
64 



Smallest 
Sum 
Probability 
P (N) N 
7.9e-S4 
8.6«-16 
i.le-05 
8 . 6o-0S 
0.21 
0.99 
0.996 
0.9992 
0.99992 



- n gb|HO9036(H09O36 yl96cll.rl Homo saplons cONA clono 46037 5« 
■W Length - 489 



Plus Strand HSPs: 
Scoro - 115 (52.1 bits). Expect - l.le-05, p - l.le-05 

22/70 (31*). Positives * 45/70 (64%). Frame 



Identities 
Query: 105 9 



9 ff{J 0 §{J"J SA {; Y J™{J N j; 0 ^ me 

Sb)ct: 7 MOLRNELRDKEHKLTOIRLEALSSAHOLOOL 



186 



Ouery: 1U9 SSRASIPVZY 1128 
SR S P 

Sbjct: 187 CSRCSFPSVH 216 



45 



50 



55 



gblAA04912<IAA049124 m}46f04.rl Soares mouse embryo NbME13.S 14.5 Mus 
musculus cONA clone 479167 5* nus 
Lengtb - 337 
Plus Strand HSPs : 
Score - 106 (48.0 bits). Expect - B.6e-05. P - B.6e-05 
Identities - 23/58 (39%). Positives - 38/58 (65%), 



Fra 



- -3 



Ouery: 
Sb 



«ry. .057 OVIELKOELKERDSALrEVRLDNLORAREVDVLRETVNKLKTENKOLKKEVDKLTNCP 1114 
1^, E w E ** L ** RL * L * A ** D LRET ♦ + *+ E LK E D*L P 

Jet. 99 EVStLRSELWEKEMKLTOlRLEALNSAHOLDOLRETMHNMOLEVOLLKAENDRLKVAP 272 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 136/270 PCT7EP97/06956 
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A Search of the Genbank databases with part of the nucleotide 
binding domain of CV-UNC-53 does not identify statistically significant proteins 
except for the C. elegans cosmid containing Ce~unc-53. 

TBLASTN 1.4.8MP £ 2C- June-1 995 ] (Build 18:00:05 Aug 29 19951 
Query- sections (240 letters) 

>lcl I sections 

I LT£KRLVLAGATGIGKSKLAKTLAAYVSIRTNOSCOSIVNISIPENNKEELLQVERRLE 
KILRSKESCIVILONIPKNRIAFWSVFANVPLONNECPFWCTVNRYOIPELQIHHNFK 
MSVHSNRtXCFILRYLARRAVEDEYRLTVQMPSELFKIIDFFPIALQAVNNFIEKTNSVD 
VTVCPRACLNCPLTVOCSRtWFIRLWNENFIPYLERVAROCKKNLRSLHFLRGSHRHRlJt 



Database: 



Non-reour.dant P0B*G8update+GenBank*EHBLupdate+EMBL 
520,303 sequences; 367,017,413 total letters. 



Sequences producing 
cmb|Z47610ICEF4SEiC 
gb|R4l071 IRU071 
gb)T44781 IT44781 
embl248334ICEF10BS 
gb IMS 18 84 IEPFCPCG 
gblL09S47|PEAPCLP 
gb(M32604 ITOMC04B 
emb 1X691861 APTUSGA 
gblT447B2lT<«762 
gb I Ml 708 71 H-MRA5K:2 
emb 1X577021 CSNATR I 'JP 
gb I KOI 5 20 1 HUMRASXB1 



iigh-scaring Segment Pairs: 



Reading 
Frar 



Caenorhabditis eiegans cosmid F45 . . . -2 

H)c575-C Komo sapiens cONA clone k... *2 

8044 Arabldopsis thaliana cONA cl... *1 
Caenorhabditis elegans cosmid F10... +3 
Zpifagus virginiana chloroplast c... *1 
Pisura sativum (clone pCLp) nuclea... +1 
Tomato ATP-dependenc protease (CD. . . +1 
A.pyhllitidis mRNA for gamma-tubul in +2 

8045 Arabidopsis thaliana cONA cl... 4-1 
Human c-ras-Ki-2 activated oncoge... *2 
G.gaiius ANA for precursor of nat... *3 
Human lung adenocarcinoma (PR371) ... +2 



>gb|R41C71 IRUC71 Hfc575-f Homo sapiens cDNA clone *S7S-f. 
Length -310 
Plus Strand HSPs: 

Score - S3 (24.5 bits}. Expect - 0.40, Sum P(2) - 0.33 
Identities - 9/15 (60%), Positives » 13/15 (66%). Frame - *2 

Query: 130 CFILRYLRRRAVEOE 144 

GF**RYLRR* VE ♦ 
Sb)ct: 26 GFLVRYLRRKLVESO 70 

Score - 47 (21.7 bits). Expect - 0.40. Sum P(2) - 0.33 
Identities • 9/26 |34%>, Positives • 17/26 (65%), Frame - O 

Query: 170 MNFIEXTNSVDVTVGPRACLNCPLTV 195 

- F-E.K ♦ **D »GP L* PL t 
Sb)ct: M7 HTFLEXHSTLDFLICPCFFLSGPLAL 22 < 



Smallest 
Sum 



High 


Probability 


Score 


P (N> N 


1131 


5.1e-158 2 


S3 


3.33 2 


74 


0.35 1 


71 


0.83 3 


49 


0.91 4 


71 


0.99 1 


71 


0. 99 1 


56 


0.992 3 


66 


0.9995 1 


58 


0.9998 1 


56 


0.9999 2 


57 


0.99995 1 



BNSDOC1D: <WO 982481 0A2 I > 



WO 98/24810 137/270 PCT/EP97/06956 



Three frame translation of EST gb:R41071. 

Regions of homology region with Ce-Unc-53 in two different frames are underlined 

1 0 — — w*. »iw«iiwiu5jr id ui auuiiiir size as mat in C"£-UNC-53. 

5 Subsequent re-cloning and re-sequencing of this region in man identified multiple 
sequencing errors gb;R41071 t and identified an ORF which is more homologous to 
and co-linear with C<r-UNC-53 (see alignment in fig 12). 



10 



15 



CTCCAACAACGTGGAGCCAGCCAATGGCTTCCTGGTTCGTTACCTGAGGAGGAAGCTGGT 
10 20 30 40 SO 60 

LQQRGASQWLPGSLPEEEAG 
SNNVEPAN GFLVRYT. P R K T V 
P TTWSQPM A S, W F V T « *G G T 'w * 

AGAGTCAGACAGCGACATCAATGCCAACAAGGAAGAGCTGCTTCGGGGTGCTCGACTTGG 

?n » „ 80 90 100 no 120 

20 RVRQRHQCQ.QGRAASGCSTH 
_EL_^_J2 SDIMANKEELLRGARLG 
SQTATSMPTRKSCFGVLDLG 

GTACCCAAGCCTGTGGTATCATCTTCCACACCTTCCTTGAGAAGCACAGCACCTTAGACT 
° 130 150 160 170 ISO 

V PKPVVSSSTPSLRS T A P * T 

YPSLWYHLPHLP*EAQHLRL 
TQACGIIF H T F T. P K H g *r \ \ 

TTCTCATCGGCCCTTGCTTCTTTCTGTCGGGTCCATTGGCATTGAGGCTTCCGGACCTTG 
190 200 210 220 230 240 

FSSALASFCRVHWH * G F R T L 
S H RPLLLSVGSIGIEASGPC 

— i* — r g p — err L S G P L A T, R L P D L V 

TTTATTGACCTGTGGACAACTCTATCATTTCCTATCTACAGGAGGAGCCAAGGATTGGAT 
250 260 270 280 290 300 

F I D LWTT LSFP I YRRSQGLD 

40 v . T D C „ G ° L Y " F L S T G G A K D W I 

40 Y * PVD NSIISYLQEEPRIG» 

AAAGGTCCAT 
310 

A* K G P 

45 K V H 

R S 



30 



35 



BNSDOCID: <WO 9824810A2_1_> 



WO 98/24810 



138/270 



PCT7EP97/06956 



/^dP. £i Mastn search of the EST division of Genbank 



with Hu.unc-53/I cDNA 3b. 
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(Build 14:27:0? Apr 1 1996) 



3LASTN 1.4.9MP [26-Maren-i 996» 
Cuery- Hu-unc-53/1 cONA 3b 
<3256 letters) 

Database: Non-redundant database oi Cenflank EST Division 
647,253 sequences; 234.216.808 total letters. 



Sequences producing 

gb|N36659IN36659 

53/1 

?OlAA04 3997|AA043997 
9b|AA04 912 4IAAO4912 4 
gblT0SS60lT0SS60 
gbtN24661 IN24681 
9OIR41071 (R41071 
9blN89104 (N89104 
?b!R4l073|R41073 
gblR15492 I R15492 
gblH09036lH09036 
gb|W91567fW91S67 
9OIW74400IW74400 
qblAA003314 IAA003314 



High-scoring Segment Pairs: 
yx91fcC9.rl Kcmo sapiens cONA clone 2.. 

xkSBaOl.rl Soares pregnant uterus Nb. . 
m346.34.rl Soares mouse embryo NbMEl . 
ESTC3449 Homo sapiens cOMA clone HFB. . 
yx91fa39.sl Homo sapiens cONA clone 2 
'HVAZ t J iomo a *Pi*na COMA clone k57S-£ 
2J?if r /S tai heart. Lambda ZAP Expre.. 
CSi S i~£ !! omo J *P i «"» cDNA clone *144-f 
HH4J4-F Homo sapiens cONA clone H434-F 
yl96cll.rl Homo sapiens cONA clone 4.. 
MTA.C36.093.A MTA adult mouse thymus.. 
«d62c:o.rl Soares fetal heart NbHHl 9 . . 
mgSfihlO.rl Soares mouse embryo NbMEl 



Smallest 

Sum 



High 


Prcoabili 




Score 


P<N) 


N 


1666 


2.1e-130 


1 


1316 


8.3e»129 


3 


1324 


9.1e-102 


1 


692 


S.Ie-84 


3 


782 


9. 9e-75 


2 


S3S 


1.5e-72 


4 


451 


7.3e-57 


2 


555 


1.5e-36 


» 


416 


2.3e-29 


2 


438 


9.4e-26 


1 


317 


1.9e-i7 


2 


243 


2.Ze-09 


1 


141 


0. 54 


1 



LOCUS 
assignment 
hu-UNC- 



-UNC-53/1 
-UNC-53/1 
-UNC-S3/1 
■UNC-53/1 
■UNC-S3/1 
•UNC-53/1 
•UNC-53/1 
UNC-53/1 
OKC-S3/2 
OMC-53/? 
UNC-S3/1 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 



139/270 
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TBLASTN search of the Genbank sequence database with the 961 
" . w ™ J ° OI nu -^C-S3/l. hu-UNC-53/lforms a uniqne pair 

w,th Ce-UNC-53 (cosmid F45E10) compared to the rest of the database. 



TBLASTN 1.4. 9NP ( 26-March-i996| (Build 14:27:13 Apr 1 1996) 

Query* tmpseq^l 

(961 letters) 

Database: Non-redundant GenBank*EMBL*0DBJ+PDB sequences 
261,674 sequences; 371,416,172 total letter*. 



Sequences producing High-scoring Segment Pairs: 



Reading High 
Frame Score 



embl 247810 1 CEF45E10 
gb(M97501 IHUMCLIP 
emb 1X64 638 IHSRESTIN 
gb i M S 8 7 5 2 I ECOJ1CRBC 
embl Z115B2 t SCNUF1C 
emb l X 732 97 | SCSETRP 4 
embIXS4002|XLXINE5IN 
gblU42409lOCU424O9 
gblU103991YSCH8082 
gb|U20810lATU20810 
gbl L07879 I LEI KINLIKE 
gb I L03 1 W I YSCINTANA 
gb|U28372IYSC09476 
gb I M94 3 62 I HUHLAMBBA 
gb(M58 33 7lVACHACMA 



Caenorhabditis elegans cosmid F45E10 -2 1*8 

Human cytoplasmic linker protein-... +3 83 

H. sapiens mflNA for restm *1 83 

E.colL-Ticrfl and mcrC genes, compl... *3 82 

S.cerevisiae nufl gene * \ 82 

S.cerevisxae spacer element *i 82 

X.laevis mftNA Cor kinesine *2 63 

Dictyostelium discoideum myosin h... »3 66 

Saccharomyces cerevislae chromoso... +2 77 

Arabidopsis thaliana cytoskeleton. . . +1 77 

Leishmania chagasi kinesin-like p. . . *2 78 

Saccharomyces cerevisiae integrin. . . *2 65 

Saccr.aromyces cerevisiae chroraoso... *3 82 

Human lamin B2 < LAMB 2 ) mRNA, part... *1 75 

Vaccinia virus hemagglutinin gene. ♦! 74 



Smallest 
Sum 
Probability 
P(N) N 



2.3e-32 
0.47 
0.47 
6 
61 
74 
65 
92 
93 
95 
95 
0.997 
0.9991 
0.9996 
0. 99995 



BNSOOCID: <WO 982481 0A2 I > 



WO 98/24810 140/270 PCT7EP97/06956 



HUMAN UNC53-1: 
3700bp 



Human heart cDNA 
nt 1-3241 

Human colorectal adenocarcinoma 
nl 15710300 

Human bean cDNA 
nt 1743-1374 



Human bean cDNA 
nt 1733-3337 

Human heart cDNA 

nt ISB5O706 



Human UNC53-2 



Figure 8 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 141/270 PCT7EP97/06956 



A B C D E 



■••^^■1 ORF 1627 aa <4.Qexp-34 with Ce-Unc53) 



II III I ■ 

9Ub 



pLMI 



phht4-3 

—— lambda hh 3b 



pCB21Q -14 lambda CAD1 7 

lambda HH15 

pCB212 



HU-Unc53/1 : 6.1 
kb 



Figure 8a 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 142/270 PCT7EP97/06956 



HUMAN UNC53-1: 3700bp 

N36659 — 
AA043997 — 

T0556O _ 
N2468I - 
R41071 MB 

N89I04 . 

R4107J 



W91567 - 
W74400 



Source: 

Human m 2775-3115 
Human pregnant uterus nt 2584-2969 

Mouse embryo ni 1 067- 1 407 

Human nt 2803-3050 

Human nt 3000-3200 

Human m 2284-2599 

Human fetal heart nt 3042-3247 

Human nt 3131-3247 

Human nt 1579-1669 

Human n( 1771-1937 

Mouse thymus aduli nt 31 14-3235 

Human fetal heart nt 3 1 92-3247 



Figure 8b 



BNSDOCID: <WO_9824810A2_L> 



WO 98/24810 



143/270 
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GRfiTTCCGGGGRRGAGCTGACnGGflGTGGAAGeCC«^ )0Q 

\f\& S G S P " fl ° ° L ° S " ° * ° * M T L P * K C <* » 
TACCRGCrTCRGTCCCRGGAGGRGRCCRRGGRGRGGCGRCflTTCCCRTRCCRT TGGTGGGCTGCC TGflfl TCCGATGRCCAG TCRGRGC TCCCTTC TCCCC 200 
VQLOSOCETrtRR M S MTIGGLPESOOOSCiPSP 

CTGCRCTrCCCRTGTCTCTGRGTGCnRRGGGCCRflCTTRCCflflCflTflGTGflGTCCCRCTGCGGCCRCCflCGCCRflGRflTCflCCCCCrCCflflCRGCnTCCC 300 
PRLPOSLSftSGOLTNIVSPTRRTTPdlTftSHSIP 

CRCCCRCGRGGCGGCC TTCGRGC TG fRCRGCGGC TCCCRflfl TGGGGAGCRCCC TGTCCC TGG CC G RG RG RCC CR RG GGRR TGR TTCGGTCflGGRTCC TTC »00 
THERRfELVSGSQrtGSTLSLRERPKGniPSGSr 

CGRGRCCCCflCGGRCGRTGTTCflCGGCTCRGTGCTGTCCCTGGCCTCCflGTGCCrcCTCCRCCTRCTCCTCRGCTGflGGflGflGGRTGCBflrCTGfiGCflnR 500 
« 0 P T 0 0 V N GSVlStflSSRSS TYSSRCEBnOSCQ 

1 homotogy Mock A 

TCCGGRflGCTTCGTRGGGflflCTGGAflrCfiTCCCflGGflRRfiflGTGGCCRCCTTGRCGTCrCRGCTTTCTGCCRflTGCTRRTCTGGTGGCTGCTTTTGRGCfi 600 
1 B K I RRELESSQEKVRTL TSOLSRNRWLvRBfCO 

fromoloqy pipe* A 

GaGCCTGGTGflnTRTGRCflTCCCCCCTGCGflCRCCrGGCflGRGflCGGCCGRGGflGRflGGflCRCTGRGCTGCTGGflTrTGCGRGflflRCCRrRGRCTTTCTG 700 
SL VMnrSaLRHLRETBEEKOTELLOLPETIOrL 

Homology otock A 

RflGflflRflflGRRCTCTGRGGCCCRGGCRGTCflTrCRGGGRGCCCTTflRrGCCTCflGflflRCCRCRCCCRflHGflflCTTCGGRTCRPGflGflCRflRflCTCCTCRG 800 

KKKWSEflORV IQGRLNRS E T T P K £ L fl I KftQMSS 
homology fatocfc A ^ 

flTRGCflTCTCRRGCCTCflflCRGCRTCRCTRGCCflrrCCRGCRTCGGCRGCflGCRRGGRTGCTGflTCCGRflflRRGflflGRRRflRRRflGRGTTGGGTCTflTGfl 900 
OSlSSLNSITSHSSIGSSKDROflKlC tt t t K SVVYE 

| nofflcogy pipe* 6 

GCTTCGRRGTTCCTTCRRCRRRGCGrTCflGTRTRflRRflRGGGGCCCflRGTCRGCTTCCTCflTRCTCGGRTRTflGflGGRGnTTGCTflCflCCCGRCTCTTCfl 1000 
L W S S ' » < "fSIKKGPKSRSSVSOIEEIRrPOSS 
nomotogy CHOCK 8 ) 

CCCCCCTCflTCCCCCRRRCTRCRGCRTGGrTCTRCflGRGRCTGCTTCRCCCTCCfiTCflRGrCCTCCflCCTTCTCCTCCGTGGGCRCTGRTGTCRCCGflGG 1 100 
RPSSPKLOHGSTETHSPSIKSSTLSSVGTOVTE 

CCCCTGCTCRCCCRGCCCCCCflCflCTRGGCTGrTCCRTGCflflflTGRGGRGGfiGGRGCCRGflGflflGRRGGRGGTRTCGGflGCTGCGCrCTGRGCTflTGGGR 1200 
GPRHPRPHTflLfHRHEEEEP EKKEVSELRSELVC 

| nomotogy bloc* C 

GflflGGflflRTGRRGCnflCRGflCRTCCGCTTGGflGGCCCTCRflCTCTGCCCflCCRRCTGGflTCflGCTTCGGGRGRCCHTGCflCRflCflTGCflGTTGGRGGTG 1300 
K C fl K L ^OIRLERLWSRHOLOOLRETnMNnQLEV 

homology block C 

GflCCTGCTGRRRCCflGRGRflTGRCCGaCTGflR6GTRGCCCCRGGCCCCTCflTCflGGCTCCflCrcCRGGGCflGGTCCCTGGflTCflTCTGCRT7RTCrTCCC UOO 
OtlKRENQfll K VRPGPSSGS TPGOVPGSSRLSS 
frpmotogy block C ^ ) 

^■^■■^■b weak homology in hum i vs nu*n2 m 

CflCGCCGCTCCCTRGGCCrGGCRCTCflCCCaTTCCrTCGGCCCCflGTCTTGCflGRCRCRGflCCTGTCflCCCflTGGflrGGCRTCRGTflCTTGTGGTCCflRfl ISOO 
PRRStGlRi THSFGPSLROTOLSPnOGtSTCGPK 
^■"^ Ma^jiK homology <n humi vs hym2 ""^^^MHMMnnMi^HM^BMBHH 

GGRGGRRGTGRCCC7CCGGGTGGTGG:cSGGnTGCCCCCGCflGCflCflTCRrCRflflGGGGRCTTGRflGCRGCRGGflflrrCTTCCTGGGCIGmGCflRGGTC 1600 

E e vt LRvvvanppQM ( ikgol KQOErrcGCSJcv 



nomotogy ciqck 0 



■"^^■^^■^^■■■'■■■■■"^^^■b *ea« nomotogy in num I *s hum,? "■^■■■■■■■^■(■•■•i^mm^^ 

RGTGGRRflRGrTGRCTGuRflGaTGCrGGSTcaflGCTGTTTTCCRRGTGTrCflRGGRCTflTRTTTCTRRRflTGGRCCCRGCCTCTfiCCCTGCGRCTfiRGCR 1?00 

SG « v Q*<"i.CEAVfovrKOY i s * n o p a $ t g 1 s 

homology olocii O 

C:G^GTCCflTCCRrGGCTSCnGCfl'CRG:;«CGTGflflRCGRGTGTTGGflTGCflGRGCCCCCCGnGRrGCCTCCrrGCCGTCGaGGTGTCRflrRRCflTflTC 1800 
TtSlw ^ v SISH VK fiy LOREPPCnPPCPOGVNNlS 
nomoiogy piocw O ^) 



BNSDOCiD: <WO 982481 0A2J_> 



# 



WO 98/24810 

144/270 PCT/EP97/06956 



I — — =— 1 tP « PnnoHV| s l l l 

L homology p>cck E • pred nucieottda BO 




; ■ Namotogy a ecu 6 . prea nucleotide 60 ■ C ' Y N 

~~ — — — . — z, !I "Gi NLsr w n i t r s n m 

' . homology Peck E . prep nucleotide BO 

ftcnMioqyttocfcE.pfqd nucl eotide BO < C C I L Bj q 

y rLEKMSr so f L ICP C F f t. S C P 
*"w«ogy Hoc* E . prep nucleotide 60 ' 

— " LVWWSlt p VlQCGflKO G I K V H 

~ — — — — _ — homolo gy Bock E » pred nucleotide BP ~ 

' aDT *- p VPSfl O QOQSr i v h l P P 

. homology Mock E - preo nucleotide BO " * 

hom ology Bock E • preo nucleotide 80 " — — — 

— 1 * 5 p 0 H CT 'LDPNLQflTl. 1 ~" 




. 3' untranslated trailer " — — ■ — 

■ 3' untranslated traitor ~ ■ — 1 

j»«««^fC,« WW ff T-^^ CTICCCMrraTOTmMr r cCflcHMTCTGECTf , f , Br , rffnKnoo _ 

3 unoamiaiecHraiier — 

■ 3' untranslated in tier — 

^ ^GftGCTGCCCGCTCCTGCCCTCTGGflTGPCRmG G GGRCBTCRnCflflGflCGGCrcrr^ rrrr.^car. 360o 

, y untranslated Warier ~ ■ 

I£«^c^^ ccr.. taK:tTC ^ w ^ wrMctCT6TCTTTEWf oar rrWBCflWTCCBCCmfla „„ rfTTrKar 

" ■ 3' untranslated tra iler - ■ - — 
TTflRCC 3706 " " ' 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 145/270 PCT/EP97/06956 



% 

Tuesday. 18 November 1997 10:33 ff Page ( 

Iu-Unc53/1 seq (1 >6013) Site and Sequence \M\ * A h D*** 

fcnzymes : 60 ol 148 enzymes (Filtered) (jl / J 

Settings: Linear. Certain Sites Only. Standard Genetic Code 

GATATCTGCAGAATTCGGCTTCTTTGAGCAAGTTCAGCCTGCTTAAGTCCAAGCTGAATTCCGGGGAAAGCCGAGCCGGATCCCTCGAGGACCCTATGCG 

, , . , i 1 . 1 ■ ) ■ 1 < 1 1 ' 1 1 I ICO 

CTaTAGACGTCTTAAGCCGAAGAAAC tcgttcaag tcggaccaattcaggttcgacttaaggcccctttcggctcggcctagggagc tcctgggatacgc 

3= 3 ' X 



- pCR2.1 linker 1 lambda gt10 primer EcoRI 



suspect sequence linker? ^-pHH14-3 — 



I SAEFGFFEQVQPG VQAEFRGKPSRIPRGPYA 

GAGGTCAAGCCGCTCAGCAAGGCGCCTGAAGCGGCCGTGAGCGAAGATGGCAAATCGGACGACGAGCTGCTCTCCAGCAAGGCCAAGGCGCAAAAGAGCT 

1 i i i ( i i i i i i i ■ 1 i I ' I i i , i 1 i 1 ■ i 2C»*' 

CTCCAGTTCGGCGAGTCGTTCCGCGGACTTCGCCGGCACTCGCTTCTACCGTTTAGCCTGCTGCTCGACGAGAGGTCGTTCCGGTTCCGCGTTTTCTCGA 



-pHHU-3 



EVKPLSKAPEAAVSEDGKSDDELLSSKAKAQKS 
CTGGGCCTGTCCCCTCTGCCAAGGGCCAGGAGGAGCGCGCCTTCCTCAAGGTGGACCCCGAGCTGGTGGTGACCGTGCTGGGAGACCTG6AGCAGCTGCT 



GACCCGGACAGGGGAGACGGTTCCCGGTCCTCCTCGCGCGGAAGGAGTTCCACCTGGGGCTCGACCACCACTGGCACGACCCTCTGGACCTCGTCGACGA 



■pHH14-3 



SGPVPSAKGQEERAFLKVOPELVVTVLGDLEQLL 

CTTCAGCCAGATGCTGGACCCAGAGTCCCAGAGAAAGAGGACAGTGCAGAATGTCCTGGATCTCCGGCAGAACCTGGAAGAGACCATGTCCAGCCTGCGA 

, ■ \ i it'll 1 I ■ ■ ■ 1 ■ 1 1 ■ 1 1 1 i 1 « 1- UOO 

GAAGTCGGTCTACGACCTGGGTC TCAGGGTCTCTTTCTCCTGTCACGTCTTACAGGACCTAGAGGCCGTCTTGGACCTTCTCTGGTACAGG TCGGACGCT 



■pHH14-3 



-ORF (1-579bp) « pLM7 ORF 



-full available ORF HU-Unc53/1 = pLMI OR 



F SQMLOPESQRKRTVQNVLDLRGNLEE TMSSLR 

GGGTCCCAGGTGACTCACAGCTCCCTGGAGATGACCTGCTACGACAGCGATGATGCCAACCCACGCAGCGTGTCCAGCCTCTCCAACCGCTCGTCCCCTC 

i.i i l i I ' ' I 1 1 ' * 1 ■ I I i I - 

CCCAGGGTCCAC TGAGTGTCGAGGGACCTCTACTGGACGATGCTGTCGCTACTACGGTTGGGTGCGTCGCACAGGTCGGAGAGGTTGGCGAGCAGGGGAG 



■pHH14-3 



-ORr (t-S70bp) = pLM7 ORF 



Mull available ORF HU-Unc53/l - pLMI OR 



GS0VTHSSLEMTCYDS0DANPRSVSSLSNR3SP 

TGTCATGGCGCTATGGCCAGTCCAGTCCGCGGCTGCAGGCTGGTGACGCGCCCTCTGTGGGTGGGAGCTGCCGCTCGGAGGGGACGCCCGCCTGGTACAT 

' lii i l ' iii | i i i i 1 i t i i i - 6 £0 

ACAGTACCGCGATACCGGTCAGGTCAGGCGCCGACGTCCGACCACTGCGCGGGAGACACCCACCCTCGACGGCGAGCCTCCCCTGCGGGCGGACCATGTA 



■pHH14-3 



-ORF M-S79bp) = pLM7 ORF 



-full available ORF HU-Unc53/1 = pLM1 OR 



ISVRYGQSSPRLOAGDAPSVGGSCRSEGTPAVYM 



BNSOOCID: <WO 9824810A2J_> 



WO 98/24810 * Ann™ 

146/270 PCT/EP97/06956 



Tuesday. 18 November 1997 10:33 
!iS Hu-Unc53/1 seq (1>6Q13) Site and Sequence 



CCACCaC G ,ACCG G CCCACTACrcCCACACCAT G CCCA T G C G CAGCCCC f CAA G CTCAGCCATATCTCCC C CC TG6A G CTGGT C n^rr^ T .-.. T ..-. 
COrGCCGCTTGCCCGGGTGATGAGGGTGTGGTACGGGTACGCGTCGGGGlcGTTCGAaTCG G.ATAGAGGGCGaACCTCGACCAGCTTAGGGACCTrL;- 



Page 7 - 




-full available ORF HU-Unc53/1 = pLMl OR 



. — w , . — psbiri i v^n — ■ 

" . G E . " A " Y 5 " T " P " R ^ P S K L S H , S R L E L V E S (. 0 3 



11 ■ ■ . . , , < — » *- >* K. V C» 

CATGAGGTGGACCTCAAGTCCGGCrACArG AGCGACA GT G ACCTCATGGGCAAGACCATGACGGAGCATGATGACATCACrACC GaCTGGGAX,^^, 
CTACTCCACCTGGAGTTCAGGCCGATGTACTCGCTGTCACTGGAGTA CCCGTTCTGGTACTGCCTCCTACTACTGTAGTnflTfinr.-r 




_ _ - hl.iv) t wn — 

^-^^TCAGTAGTGGACTCAGCGATGCCTCAGACAATCT CAGTTCAGAAGAATTCAATGCCAGCTCCTCAC TCAArrr^r . 

CCAGGTAGTCATCACCrGAGKGCTA^ ^ 




° nslpstpt« 



" "^1^!^^^^^^^^^^^^ ^^^^ ^^^^^^^^^GTGGGCTGAGCTGGTTTAnTftAA TrAf:*/- -. 

M>.oAGCGTCCTTGAGTTGTTATCACGATGC G TGTCTGAGrCTCTTC GC G AGr G ACCGTCTTTCACCCGACrCGACCAAATCAr TTA^Trrrr T/-TTT-^.~ ' i 




, 3 , RR | NST * VLR TP 3E K R SLAESGL SVF S E a e ri T" 



GbATTTTTTGACCTCATGCTGTCACCArCGGACTTCTACCTTGGACCC TGAAGATTCACCGCCTCCCTCGCCGGACTC TCGACACfACTAAGTAGGrTCC ' " 



-PHH14-3 
"PCB212 



-full available ORF HU-Unc53/1 = pLM! OR 



BNSDOCID: <WO_9824810A2_I_> 



WO 98/24810 147/270 PCT/EP97/06956 



Tuesday. 18 November 1997 10:33 fa J ' ^ Page - 
tiq Hu-Unc53/1 seq (1 >6013) Site and Sequence 1 

GTG GAGAACTGAAAAAGCCCATCAGCCTGGGCCACCCTGGTTCCCTGAAGAAGGGCAAGACCCCACCTGTGGCTGTAACTTC ^ 

CACCTCTTGACTTTTTCGGGTAGTCG6ACCCGGTGGGACCAAGGGACTTCTTCCCGTTCTGGGGTGGACACCGACATTGAAGGGGGTAGTGAGTGTGTCG 

— — pHH14-3 



pCB212 

■ full available ORF HU-Unc53/l = pLM1 OR — - 
GGELKKPl SLGHPGSUKKGKTPPVAVTSP ITHTA 



CCAGAGTGCCCTCAAAGTCGCAGGCAAACCTGAGGGCAAAGCTAC AGACAAGGGTAAGCTTGCAGTGAAGAATACTGGGCTCCAACGCTCCTCCTCTGAT 
GGTCTCACGGGAGTTTCAGCGTCCGTTTGGACTCCCGTTTCGATGTCTGTTCCCATTCGAACGTCACTTCTTATGACCCGAGGTTGCGAGGAGGAGACTA 



pHH14-3 



pCB212 

— Ml available ORF HU-UncS3/1 « pLM1 OR 
OSALKVAGKPEGKATDKGKLAVKNTGLQRSSSD 



GCTGGTCGGGACCGCCTGAGTGATGCTAAGAAGCCCCCCTCGGGCATTGCTCGCCCCTCCAC TTCGGGATCCTTTGGCTACAAGAAGCCTCCTCCTGCCA 

i i i i i i i — 1 1 i i i ' i 1 1 1 » ■ i 1 * 1 1 1- i acc 

CGACCAGCCCTGGCGGACTCACTACGATTCTTCGGGGGGAGCCCGTAACGAGCGGGGAGGTGAAGCCCTAGGAAACCGATGTTCTTCGGAGGAGGACGGT 



PHH14-3 



pCB212 

lull available ORF HU-Unc53/1 = ptM1 OR 

AGRDRLSDAKKPPSGI ARPSTSGSFGYKKPPPA 



CAGGCACAGCCACTGTCATGCAAACTGGTGGTTCAGCCACTCTCAGCAAGATCCAGAAGTCCTCAGGCATCCCTGTCAAGCCAGTAAATGGGCGCAAGAC 

, , | ; i ■ , — | , 1 ■ » 1 1 1 ' 1 1 ' 1 ' ■ 1 1 

G7CCGTGTCGGTGACAGTACGTTTGACCACCAAGTCGGTGAGAGTCGTTCTAGGTCTTCAGGAGTCCGTAGGGACAGTTCGGTCATTTACCCGCGTTCTS 



pHH14-3 



pCB212 



full available ORF HU-Unc53/1 =pLM! OR 

TGTATVMOTGGSATLSKIGKSSGIPVKPVNGRKT 

TAGCTTAGATGTTTCCAACAGTGCAGAGCCAGGATTCCTGGCTCCTGGAGCCCGTTCTAACATCCAGTACCGCAGCCTGCCCCGGCCAGCCAAGTCAAG- 

i ! 1 1 ' ' I ' ' I " » 1 ' ' 1 ' 1 ' 1 1 i ■ I 

ATCGAATCTACAAAGGTTGTCACGTCTC6GTCCTAAGGACCGAGGACCTCGGGCAAGATTGTAGGTCATGGCGTCGGACGGGGCCGGTCGGTTCAGTTCA 
pHH14-3 1 



pCB212 

lull available ORF HU-Unc53/1 = pLM1 OR 

3LOVSMSAEPGFLAPGAR5M IQYRSLPRPAnS 



BNSDOCID: <WO 982481 0A2_L> 



WO 98/24810 

148/270 PCT/EP97/06956 



Tuesday. 18 November 1997 10 33 
& Hu-UncS3/1 sag (i > 6Q13> ^ 1 



TC 



rAr G ,cc G r G ,cco G ccccccaa C r GG ,ccTcccccT G TCACCACCAC CA r T n flr TZ^Z; 



AG4TACTCGCAC 




" fu " avauable ORF HU-Unc53/1 = pLMl OR 
S . V T G G R G G P R p v s 



r*v primer HU53rv.j 

S S ' 0 P S L L S T K 0 



GACT G A AGG AGCCrACCAA GG TA G crAnrr.r.^.,. rirTrrnrr ' T ? - 





L D s 0 N , S , „ -^availableOHhHU-UncSa/l.pLMlOR- 
- * , ? 1 G S P E S T 



p K N Q A s H 



~-~ — ™ 



P T A T 



K L A E L 



P P T P L R A T a t VaUaUe UHh HU.Unc53/1 = pLM1 OR 
A ^ | S F y K P P S I A N |_ Q 



K V N S N S 



L D L P S 



BNSDOCID: <WO 982481 0A2J_> 



1 AtMinti PCTVEP97/06956 
WO 98/24810 149/270 



Tuesday. 18 November 1997 10:33 h * *) Page S 
fie, Hu-Unc53/1 seq (1 > 6013) Site and Seq uence — „ . 

CCAGTGATACCACCCA TGCrTCAAAGGTCCCAGATCTGCATGCTAC AAGC TCAGCATC TGGGGGCCC TCTCCC TTCCTGCTTCACCCCCAGTCCGGC^CC _ ^ 

GGTCACTATGGTGGGTACGAAGTTTCCAGGGTCTAGACGTACGATGTTCGAGTCGTAGACCCCCGGGAGAGGGAAGGACGAAGTGGGGGTCAGGCCGTGu 



-pHH14-3 



-pCB210-14 



■ full available ORF HU-Unc53/1 = pLMl OR 



SSOTTHASKVPOLHATSSASGGPLPSCFTPSP-AP 
■ . • 1 1 — » 1 1 1 ' ' 1 ' — 1 — — 

CATCCTC AATATTAAC TCAGCCAGCTTCTCCCAGGGCCTGGAGCTAATGAGTGGTTTCAGTGTGCCAAAAGAGACCCGCATGtACC CCAAAC tctcaggc 

6TAGGAGTTATAATTGAGTCGGTCGAAGAGGGTCCCGGACCTCGATTACTCACCAAACTCACACGGTTTTCTCTG6GCGTACATGG6GTTTGAGAGTCC3 



-pHH14-3 



•pCB210-14 



Mull available ORF HU-Unc53/t s pLM1 OR 



It N INSASFSQGLELMSGFSVPKETRMYPKLSG 



TC 



CTGCACAGGAGCATGGAGTCCCTCCAGATGCCAATGAGCCTCCCCAGTGCCTTCCCCAGCAGTACTCCCGTCCCCACCCCACCTGCTCCCCCTGCTGC 

, , | . \ . t ■! ■ I I I t I ■ I ' ' I '' ' ' ' " 

GACGTGTCCTCGTACCTCAGGGAGGTCTACGGTTACTCGGAGGGGTCACGGAAGGGGTCGTCATGAGGGCAGGGGTGGGGTGGACGAGGGGGACGACGA'J 



-pHH14-3 



-pCB210-14 



-full available ORF HU-Unc53/1 = pLM1 OR 



LHRSMHSLGMPM S L P S A F P S S T P V P T P P A P P A A 

CCACAGAAGAAGAGACGGAAGAGCTGACTTGGAGTGGAAGCCCC AGAGCTGGGCAACTGGACAGTAATCAGCGGGATCGGAACACTCTTCCCAAGAAAGG _ 
GGTGTCTTCTTCTCTGCCTTCTCGACTGAACCTCACCTTCGGGGTCTCGACCCGTTGACCTGTCATTAGTCGCCCTAGCCTTGTGAGAAGGGTTCTTTCC 



•pHHl4-3 



• pHH3b 



-pCB2lO-14 



-full available ORF HU-Unc53/i = pLMl OR 



I f? n ~-ir~ B r m' wwi _ ^l £ rev primer HU53rv2 



-peptide B72S28H 



PTEEETEELTVSGSPRAGQL OSNQRDRNTLPKKG 



WO 98/24810 150/270 PCT/EP97/06956 
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S3. Hu-Unc53/1 seq (1 > 60131 Site and Sequence 



Page 



C C,e. M ,,CC.CC,,C. 5 ,CCC» e . Mr cC« ^ ae c C «A,, C CC.T.c e ,,w CC „, C r r ,..., ff ..„ .-,„ , , ' 




-full available ORF HU-Unc53/1 = P LM1 OR 



LRYQLQSQEET 



TCTCCCCCTGCACrTCCCATGTC ^^^^^|^ AAA GGGCCA ACTTACCAACATAGTGAGTCCCACTGCG nrrArr Arr rrrTrr 

AGAGG6GGACGTGAAGGGTACAGAGACTCACGTT ' ' ~* ' ' ^V AAGAATCACt -CGCTCCAAL. 




•full available ORF HU-Unc53/1 = P LM1 OR* 
SP?ALf> MSLSAKGQLTN 




3 I P T H E A 



A F E L Y S G S Q M G S T L S L 



A ERPKGMIR 



ATCCTTCCGAGACCCCACGGACGATGTTCACGGCTCAGTGCTGTrrrTmrrrrr a/- 



S G 



TAGGAAGGCTCTGGGGTGCCTGCTACAAGTGCCGAGTCACGACAGGGACCGGAGGTr ArnrA^rAr r tj-^-a 



TGCCTCCTCCACCTACTCCTCAGCTGAGGAGAGGATBrAATrT 



AGGAGTCGACTCCTCTCCTACGTTA 




5 F R D P T D D V H G 



full available ORF Hi u irw^/i - r » M1 on 
3 V L S L A S S A S S T y S 



3AEERMQS 



GAGCAAATCCGGAAGCT 



, CGTTTA jGCCTTCGAAGCATCCCTTGACCTTAGTAGG GTCrTTTTTr Arrr/?Tf>- * ^ r n rTr r ' ■ ■ ■ ► 




c 0 I « K L R R 



ELE 5SQEKVA 



T L f S 0 L 



N A N L V A A 



BNSDOCID: <WO__9824810A2_I_> 



WO 98/24810 



151/270 



PCT/EP97/06956 
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fig Hu-Unc53M seq (1 >6013) Site and Sequence 

TTGAGCAGAGCCTGGTGAATATGACATCCCGCCTGCGACACCTGGCAGAGACGGCCGA GGAGAAGGACACTGAGCTGC TGGATTTGCGAGAAACCATAGA 
AACTCGTCTCGGACCACTTATACTGTAGGGCGGACGCTGTGGACCGTCTCTGCCGGCTCCTCTTCCTGTGACTCGACGACCTAAACGCTCTTTGGTATCT 3 ^ 



-U2 ORF = pCE&S! ORF 



- pHH3b 



-full available ORF HU-Unc53/1 = pLM! OR 



F E Q S L V N M T S R L R H L A E T A E E K D TELLDLRE T 10 
CTTTCTGAAGAAAAAGAACTCTGAGGCCCAGG^ 

GAAAGACTTCTTTTTCTTGAGACTCCGGGTCCGTCAG TAAGTCCCTCGGGAATTACGGAGTC TTTGGTGTGGGTTTCTTGAAGCCTAGTTCTCTGTTTTG 



~ U2 ORF * pCEi25i OR?- 



-pHH3b 



"full available ORF HU-Unc53/1 = pLMI OR 



F > K , K K N . S E AQAVIQGALNASETTPKELRIKRQM 

1 ' 1 ' 1 * n i. -1 ■ h . . 1 . . ■ . . . . . . 

TCCTCAGATAGCATCTCAAGCCTCAACAGCATCACTAGCCATTCCAGCATCGGCAGCAGCAAGGATGCTGATGCG 

AGGAGTCTATCGTAGAGTTCGGAGTTGTCGTAGTGATCGGTAAGGTCGTAGCCGTCGTCGTTCCTACGACTACGCTTTTTCTTCTTTTTTTTCTCAACCC ^ 



-U2 ORF « pC«2t5i OR:- 



-pHH3b 



-full available ORF HU-Unc53/1 = pLMI OR 



S S 0 . S ' S S L N S I T S H S S I G 5 S K D ADAKKKKKKSV 
TCTATGAGCTTCGAAGTTCCTTCAACAAAGCGTTCAGTATAAAAAAGGGGCCCAAGTCAGCTTCCTCATACTCGGATATAG AGG 

AGATACTCGAAGCTTCAAGGAAGTTGTTTCGCAAGTCATATTTTTTCCCCGGGTTCAGTCGAAGGAGTATGAGCCTATATCTCCTCTAACGATGTG3GCT 



-U2 ORF = pC925i ORF 



• pHH3b 



-full available CRF HU-Unc53/1 » pLMI OR 



V Y E L R S 5 F N K A F S I K K G P K SASSY. SDIEEIATPD 

CTCTTCAGCCCCCrCATCCCCCAAACTACAGCATGGTTCTACAGAGACTGCTTCACCCTCCATCAAGTCCTCCACCTTGTC CTCCGTGGGCACTGATGTL- 
GAGAAGTCGGGG GAGTAGGGGGTT TGATGTCG TACCAAGATGTCTCT GACGAAGTGGGAGGTAGTTCAGGAGGrGGAACAGGAGGCACCCGTGACTACAo 



J2 ORF = pCB25i ORF 



-pHH3b 



"~~ —full available ORF HU-Unc53/l = pLMI OR 

JSAPS S PKLQHCSTETASPSIKSSTLSSV GTD 



BNSDOCID: <WO 9824810A2_I_> 




WO 98/24810 152/270 PCT/EP97/06956 



Tuesday, 18 November 1997 10:33 

fir Hu-UncS3/1 seq (1 >6013) Site and Sequence 



£-'5 ^ Page* 



ACCb ,3CCCTGCTCACCCAGCCCCCCACACTAGGCTGTTCCATGCAAATGAGGAGGAGGAGCCAGAGAAGAAGGAGGTArCGGAGCTGCGCTCTCiAG: 

, ; . 1 , 1 1 . 1 . ( , , , 1 , , k 3e: 

TGGCTCCCGGGACGAGTGGGTCGGGGGGTGTGATCCGACAAGGTACGTTTACTCCTCCTCCTCGGTCTCTTCTTCCTCCATAGCCTCGACGCGAGACTCG 



- U2 ORF = pCB2S1 OR? 



•pHH3b 



- full available ORF HU-Unc53/1 = pLMl OR 



TEGPAHPAPHTRLFHANEEEEPEKKEVSELRSE 

* . . . ■ ■ ■ ■ t - .I.-- i 

TATGGGAGAAGGAAATGAAGCTTACAGACATCCGCTTGGAGGCCCTCAACTCTGCCCACCAACTGGATC AGCTTCGGGAGACCATGCACAACATGCAGT" 

1 ■ ■ , i 1 1 1 1 1 . 1 1 1 1 1— h- i . r . . i 3£0C 

ATACCCTCTTCCTTTACTTCGAATGTCTGTAGGCGAACCTCCGGGAGTTGAGACGGGTGGTTGACCTAG TCGAAGCCCTCTGGTACGTGTTGTACGTCAA 



-U2GRF « pCB25l ORF 



-pHH3b 



-DODdde B72627H 



-Ml available ORF HU-Unc53/1 = pLM1 OR 



-U3 ORF = pLM5 ORF 



LVEKEMKLTO IRLEAL NSAHQLDQLRETMHNMQL 

GGAGGTGGACCTGCTGAAAGCAGAGAATGACCGACTGAAGGTAGCCCCAGGCCCCTCATCAGGCTCCAC tccagggcagg tccctggatcatctgc atta 
1 1 . 1 . 1 1 1 1 i ■ ■ ■ i ' 1 ■ 1- 37CC 

cctccacctggacgactttcgtctcttactggctgacttccatcggggtccggggagtagtccgaggtgaggtcccgtccagggacctagtagacgtaat 



;2CfiF = pCB25i ORF 



-pHH3b 



•full available ORF HU-Unc53/1 = pLM1 OR 



-U3 ORF = pLM5 ORF 



evdlikaendrlkvapgpssgstpgqvpgssal 
tcttccccacgccgctccctaggcctggcactcacccattccttcggccccagtcttgcagacacagacctgtcacccatggatggcatcagtacttgtg 

' 1- — iii' 1 — 1 . — i 1 i i i ■ ■ i , 1 1- V;:; 

AGAAGGGGTGCGGCGAGGGATCCGGACCGTGAGTGGGTAAGGAAGCCGGGGTCAGAACGTCTGTGTCTGGACAGTGGGTACCTACCGTAGTCATGAACAw 



-U2GRF=:pCB2£J1 ORr 



-pHH3b 



full available ORF HU-Unc53/1 = dLMI OR 



-U3 ORF = pLM5 ORF 



SSPRRSLGLALTHSFGPSLADTOLSPHDGIST 



BNSDOCID: <WO_982481 0A2_L> 



WO 98/24810 153/270 PCT/EP97/06956 



Tuesday. 16 November 1997 10:34 
fit ' u-UncSS/l seq (1 >6013> Site and Sequence 



Page 1 ? 




TTCGTCGTCCTTAAGAAGGACCCGACATC 



-U2 0RF = pC62S! ORF 
■ pHH3b 




Cp KEEVTLRVVv 



-U3 OHF = pLM5 ORF 
R M P f>- 0 H I IKGD 



»-KQQ ErFLG 



C AAGGTC AGTGGAAAAGTTGACTGGAAGATGCTGGAT GAAGCTGTTTTCCAAGTGTTCAAGGAC ^ATATTTCT AAAATGGACCCAGrrTrTArrrTrf f a 




KVSGKVDVKM 



M L D £ A V F Q y F K p Y , S K M 0 P 



A S T L G 



CTAAGCACTGAGTCCArCCATGGCTACAGCATCAGCCACGTGAAACGAGrGTTn^Tr.r,^^^^^^ 



ATGCCTCCTTGCCGTCGAGGTGTCAATA 




mo; 



L s " 5 ■ * ° ■ * ■ ° * »» ■ . . . « E , , . „ , , cT.777 




BNSDOCID: <WO 9824810A2_I_> 



WO 98/24810 



154/270 



PCT/EP97/06956 



Tuesday. 18 November 1997 10:34 f^fj Page 1 *> 
fig -Iu-Unc53/1 seq (1>6Q13) Site and Sequence ^ u 

CC i OH TG AAGCACCGGCGCCTCGTCCTC TCGGGCCCC AGCGGCACGGGCAA6ACC TACCTGACCAA TCGCTTGGCCGAGTACC TG3TG6AGCGCTC TGGL 
GGACGACTTCGTGGCCGCGGAGCAGGAGAGCCCGGGGTCGCCGTGCCCGTTCTGGATGGACTGGTTAGCGAACCGGCTCATGGACCACCTCGCGAGACC^ 



-U2 OR? = pCB2S1 ORF 



pHH3b 

— U4 ORF = pCB20 1 ORF — 
full available ORF HU-Unc53/1 = pLM1 OR — 

U3 ORF = pLMS ORF 

PHH15 

LLKHRRLVLSGPSGTGKTYLTNRLAEYLVERSG 

' . L — . 1 ■ ■ * . I ■ — - . L . , . . . . . . . ■ . 

CGTGAGGTCACAGAGGGCATCGTCAGCACCTTCAACATGCACCAGCAGTCTTGCAAGGATCTGCAACTGTATCTTTCCAACCTAGCCAACCAGATAGACC 

' ■ ; ' 1 — ' 1 i 1 t > i ' i ■ i 1 i i 1 t • t i 

GCACTCCAGTGTCTCCCGTAGCAGTCGTGGAAGTTGTACGTGGTCGTCAGAACGTTCCTAGACGTTGACATAGAAA GGTTGGATCGGTTGGTCTATCTGG 

U2 ORF = pCB£51 ORF — — 

pHH3b 

- U4 ORF = pCB201 ORF — 
full available ORF HU-UncS3/1 = pLM1 OR - 

U3 ORF a pLM5 ORF 

PHH15 

SEVTEGtVSTFNMHQOSCKOLQLYLSNLAMQ ID 

GG3AAAC AGGAATTGGGGA TGTGCC CC TGGTGATTCTAT TGGATGACCTGAGTGAAGCAGGC TCCATCAGTGAGTTGG TCAATGGGGCCCTCACCTGCAA 
CC:TTTGTCCTTAACCCCTACACGGGGACCACTAAGATAACCTACTGGACTCACTTCGTCCGAGGT AGTCACTCAACCAGTTACCCCGGGAGTGGAC<iTT 

- U2 ORF pCBSSl ORF — 

— pHH3b 

U4 ORF a pCB201 ORF 

full available ORF HU-Unc53/1 = pLMl OR 

— U3 ORF = pLMS ORF — 

PHH15 

U 

R - TG I G 0 V P L V ! !_ LDDLSEAGS I S E L V N G A L TCh 
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!!2_. Hu-Unc53/1 seq (1 >6013) Site and Sequence 



Page 1 I 



GTATCA.AAATGrCCCrATATTATAGGTACCACCAATCAGCCTGTAAAAATGACACCCAACCA TGGCTTGCACTTG^GCTTCAG 

5A rAG I ATTTACAGGG ^ T . : 



GGAT GTTGACC r TC"Cw 

GAAG TCC TACAAC TGuAAoAGU 



-U2 OR? = pC6251 ORF 
-pHH3b 




-full available ORF HU-Unc53/1 = pLMl OR 



-peptide B72826H 



-pHH15 



Y H K C P Y [ IGTTNOPVKM 



TPNHGLHL5FRM 



L T F 



AACAACGTGGAGCCAGCCAATGGCTTCCTGGTTCGTTACCTGAGGAGGAAGCTGGTAGAGTCAGACAGCG^ 

_y^" G CAC ^ TCG G T -^^ ^ T ^ A CC G A AG G AC CAAGCAATGGAC T ^ ^ T ^£^j^^^^CA7^TC AG TC TGTCGCTGTAfiTTArr:r.TTr.TT/'rTTi~ Tr ^ .1 



■Ui> ORF - pCB25l ORF 
pHH3b 




N N V £ p 



-pHHlS 



A u G FLVRYLRRKLVESOS 



D INANKEELL 



GGGTGCTCGACTGG GTACCCAAGCTGTGGTATCATCTCCACACCTTCC TTGAGAAGCACAGCACCTCAGACTTCCTCATCGGCCC TTG CTTCTTTC TGT" 
CCCACGAGCTGACCCATGGGTTCGACACCA TAGTAGAGGTGTGGAAGGA AC TCTTCGTGTCGTGGAGTCTGAAGGAGTiinrrfinfiAAr,- 1^ .-. i 




■PHH15 



B V - L ° V V P . K L tf Y H L H T F L E K H S T S D F L : 



G P C F F L 



BNSDOCID: <WO 9824810A2J_> 



WO 98/24810 



156/270 




PCT/EP97/06956 



Tuesday. 18 November 1997 10:34 

tig .ij-Unc53/1 seq (1 > 6013) Site and Sequence 



Page i£ 



GToTCCCATTGGCATTGAGGACTTCCGGACCTGGTTCATTGACC TGTGGAACAACTCTA TCArTCCCTATCTACAGGAAGGAGCCAAGGAfGGGATAAA^ 
CA CAuGGT A ACC G T AA C TC CTGAAGuCCTGGACC AAGTAACTGGAC ACCTTGTTGAGATAGTAAGGGATAGATGTCCTTCCTCGGTTCC TACCC7ATTC 



- U2 ORF = pCB2S? ORF 




-U3 ORF ~ pLM5 ORF 



-pHHl5 



C P [ . G IE: DF RTV F I QLV NNS l I PYL Q E G A k 0 G I ^ 

GTCCATGGACAGAAAGCTGCTTGGGAGGACCCAGTGGAATGGGTCCGGGACACACTTCCCTGGCCATCAGCCCAACA AGACCAATCAAAGCTGTACCAC: 
CAGGTACCTGTCTTTCGACGAACCCTCCTGGGTCACCTTACCCAGGCCCTGTGTGAAGGGACCGGTAGTCGGGTTGTTCTGGTTAGTTTCGACATGG-G3 



-U2 ORF = pCB£S1 ORF 



- pHH3b 




"U3 ORF = pLM5 ORF 



-pHHlS 



" G Q-<AAVEDPVEVVRDTLPWPS 



OQDOSKLYH 



'GCCCCC ACCCACCGTGGGCCC TCAC AGCATTGCCTCACCTCCCGAGGATAGGACAGTCAAAGACAGCACCCC AAGTTCTCTGG ACTCAGATCC T C TGA~ 
ACGGGGGTGGGTGGCACCCGGGAGTGTCGTAACGGAGTGGAGGGCTCCTATCCTGTCAGrTTCTGTCG7G3GGTTCAAGAGACCTGAGTCrAGGAGACT^ 



'USt ORF ::: pCBSSi OR 




-full available ORF HU-Unc53.'1 = dLMI OR 




L P P P T V G p H S I A S P P EDRTVKDSTPSSLDSOPLr' 
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fio -Iu-Unc53/1 seq (1 >6013) Site and Sequence Vi 9 

GGCCATGCTGCTGAAACTTCAAGAAGCTGCCAACTAC ArTGAGTCTCCAGATCGAGAAACCATCCrGGACCCCAACCTTCAGGCAACACTTTAAGGGTTC 
CCGGTACGACGACTTTGAAGTTCTTCGACGGTTGATGTAACTCAGAGGTCTAGCTCTTTGGTAGGACCTGGGGTTGGAAGTCCGTTGTGAAATTCCCAA^ 



>, 



' U2 ORF = pCS251 ORF „ZJ 



- pHH3b 



•U4 ORF = pCB201 ORF . U 



-full available ORF HU-Unc53/1 a pLM1 OR — 1 



>, 



*U30RF = pLM5 ORF — 1 



-pepteo 372625H 



A M L L K L Q E A A N Y I E S P D R E T ILDPNLQATL.GF 

GGCAATCACTGTCACCCCCGGACAGCAGAACGCTGGCATCAGCTATCTTAGCTCCTCCTC TCCCCTCTCCTCTTTCAGAGCAC TG GCTC TCCAGCCCCAo 
CCGTTAGTGACAGTGGGGGCCTGTCGTCTTGCGACCGTAGTCGATAGAATCGAGGAGGAGAGGGGAGAGGAGAAAGTCTCGTGACCGAGAGGTCGGGGTC 



-pHH3b 



pHHlS 

G N H C H P R T A E R V H Q L S . L L L S P LLFQSTGSPAP 

GAGGAGAACAGGAGGGAGGAGGAGATGAAAGAGSAGGGACAGGTTCTTGGTGCTGTACCTTTGAGAACT TCCTAGGAAGGAATGGTGGGGTGGCGTTTGG 
C "CCTCTTGTCCTCCCTCCTCCTCTACTTTCTCCTCCCTGTCCAAG AACC ACGACATGGA/VACTCTTGAAGGA TCCTTCCTTACC ACCCCACCGCAAACC 



pHH3b 



-pHHtS 

ttS EQEGC^C DERGGT-GSVCCTFENFLGRNGG 



V A F 



GAACrTGTGCCCCCTAAACACATTTACTGGCCTCCTCTAATGACTTTGGGGAAAAGATGATTCTGG GTCTTTCCCTTGACTTCTTGTTTCAATTACAAA: 
CrTGAACACGGGGGATTTGTGTAAATGACCGGAGGAGATrACTGAAACCCCTTTTCTACTAAGACCCAGAAAGGGAACTGAAGAACAAAGrTAArGTTT^ "'"^ 



■pHH3b 



-pHH15 



NLCPLNTFTGLL. . LVGKDOSGSF P 



L L V S I T N 



TCC ^QGCTTTCTGGGGAGGGGTTCAGAAAACATCAAAACACTGCAGCAGTTCCTAAATGATTCTCACAAGCAACCCT GAGAGAGACAG TCTTGTGAGG3 
AGGACCCGAAAGACCCCTCCCCAAGTCTTTTGTAGTTTTGTGACGTCGTCAAGGATTTACTAAGAGTGTTCGTTGGGACTCTCTCTGTCAGAACACTCr." 

— — — I, 

pHH3b 1 

' ' — pHH15 — — 

S V A , F v G G VQKTSKHCSSS.NILTSNPEROSLVP 
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fig. Hu-Unc53/1seq (1>6013) Site and Sanue nee 



Page « f 



' ^ATCTGGGGGAGGCAGGAAGCTCCTCAGATTTTCTCACAGACCCTTCCCAATTrr.rr.r 



TCTAG 



ACCCCCTCCGTCCTTCGAGGAGTCTAAAAG 



CACTGCCAACAACTCcrCCCCCAGAGATCTGGCTGGAar 



AGTGTCTGGGAAGGGTTAAGGTAGrGGTGACGGTTGTTGAGGAGGGGGTCTCTAGArr^rrr.4 570 



-PHH15 



' ' ' 6 5 ' ' ° ' f 8 ° ' ^ ' " ^ ' ' ' ' » ~ s s . „ 0 L A c , 



kmfksickr 




SVR£R NQLpR 



, R * A R S C P L D 0 I G 0 I N 



TACTG TATCCCC TGTAGT7GT TC TGCCGACGGTTG j 



K T A A tl 



TGAGAAGTCACCAAAC 




ATTAAArTGTTACGTGGCCrrAAGTC 

E 



50CC 




GAACCTGAATTGG 

— 2 

linker? ' 

L D L T 



6013 



BNSDOCID: <WO 9824810A2J_> 
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GGCACGAGGCA7CCTCTGTGGGCACCGAGGTCA CCGAGACCCCTGCTCATTCAGTCCCCCACACTAGACT 70 
^ | open reading frame 



linker ? 



L 



RHEASSVGTEVTETPAHSVPHTRL 
GTTCCAAGCCAATGAAGAGGAGGAGCCAGAGAAGAAGGAG6TATCAGAACTGCGCTCTGAACTATGGGAA 1 MO 



open reading frame 



FOANEEEEPEKKEVSELRSELVE 

AAAGAGATGAAGCTCACGGA7ATCCGGTTGGAG6CCCTCAACTCTGCCCACCAGCTGGACCAGCTTCGGG 210 

open reading frame 



KEMKLTOIRLEALNSAH0L00LR 

AGACCATGC ACAATATGCAG7TGGAGGTGGACCTGCTGAAAGCAGAGAATGACCGGC7GAAGGTTGCCCC 250 
open reading frame 

E THHNMOLEVDLLKAENOP.LiCVAP 

CGGCCCC7CC7CAGGC7GCAC7CCAGGGCAGGTCCC7GGG7CA7CGGC7C7G7CG7CCCCTCGACGT7CC 350 
open reading frame 

GPSSGC 7PG0VPGSSALSSPR-S 

C7GGGCC77GCAC7CAGCCA7CC777CAG7CC7AG7CTCACAGACACAGACC7C7CACCCA7GGA"GGCA -20 
open reading frame 

LGLALSHPFSPSLT07DL5PM:G 

TCAGCACCTG7GGT7CAAAGGAAGAGG7GACCCTGCGGG7GG7GGTCCGGATGCCGCCCCAGCACATCAT ^90 

open reading frame 



istcgskeevt*. rvvv»mpo-:h: ! 

CAAAGGGGACTTAAAGCAGCAGGAGT7CT7CCTSGGTTGCAGCAAGG:C^G7GGCAAaGT"3ACT:GAAG 
open reading frame 

KGDLK0OEFF* w GCS<V5G<VD.K 

ATGCTGGATGAAGCCGTTTTCCAAG7GTTCAAGjACTACAT7TC"AAAA'0GACC:aGCCTCAa::c t GG cjI» 
open reading frame 

M LDEAVFOVF. D Y I $ k M D = A S ~ L 

GAC TGAGC AC TGAGTCCATACATGGC7 ATAGCC t C AGCCACGTGAAACGAGTGCTGGA TGC fGAGICCCC "X 
open reading frame 

GLSTESIHGYoLSHVKKV .OAl^ 0 



BNSDOCID: <WO 982481 0A2_I_> 



PCTYEP97/06956 




GT CGACAGCCTGGrGTTCGAGACGCrTATCCC C AAGCCCATGATGCAKrAC TACAT CAGCC T TTTixr rr n 6 , Q 
- _ open reading frame ~ ' 



V F 



T L I P K P M M 



L L 



V 0 S L 

AGCACCGGCGCCTGGTGCTC TCCGGCCCC AGT GGC ACCGGCAAGACCrACTTnArr HTrrlr li-^ ^ 
■ . . open reading frame ~ 



K ^""l-VI.SGPSGTGlCTYL T N 3 (_ A r 

GTACC:GGTGGAGCGCTCCGGCCGCGAGGTCAr G GATGGCArCGTCAr,r.crTTCAACAT.r, rr^r,^ „ 3 
■ open reading frame " — — ~ 



Y L V £ 




^ACCATGGC TTGCAC TTGAGCTTr Af^r; AT/^rTr Arr y j^y^Q^^^^^^^ AArr J_ 

■ open reading frame ' " 1 ~ 




I£GGGTG^TGGA C 7GGGTGCCCAAGC TG T GG rATCACC TCCACACC TT(~r Tr.r: A/r^j^-^-f^-j-p - -- 
open reading frame — — * 



BNSDOCID: <WO 98248 10A2J_> 



PCT/EP97/069S6 
WO 98/24810 161/270 



GACTTCCTCATTGGCCCTTGCTTCTTCCTGTCCTGTCCCATTG GCATCGACCACTTCCGGACCTCGTTCA U70 

~ ~ ~" open reading frame 



0 F u , G P C F f L S C P I G I E 0 F R T V F 

ttgacctgtggaacaattccatcatcccctatctacaggaaggagccaaggatgggatcaaggttcatgg wo 



open reading (rame 



I OLVNNS ! JPYLQEGAKOGIKVHG 
ar A^AAA firTGCTTGGGAAGACCCGGTGGAATGGGTCCGAGACACT rTTCCCrGGCCGTCGGCCCAACAA »6 



10 



open reading frame 



0 K A A V E 0 P V E V V R 0 T L P W P S A 0 0 

GACCAATCAAAGCTCTACCACCTGCCCCCGCCTTCTGTGGGCC CCCACAGCACTGCCTCACCCCCGGAG5 ^680 
' — ~ open reading Irame 



0 C S K L V H L =» P P S V G P H S T A S P P E 
ACAGGACAGTCAAAGACAGCACTCCAAACTCCCTCGACTCAGATC rrrTGATSGCCATGCTACTGAAACT 

" ~~ open reading frame . 



750 



320 



DRTVKDSTPNSLOSOPUMAMLLKL 
CCAAGAAGC7GCCAACTACATTGAG7CACCAGATCGAGAGA CTATCCTGGACCCCAACCTCCAGGCGACA 

open reading frame 

OEAANY1E5PORETIL0PNLQAT 
CTC TGAGGGCCCGGCAG7CACTGTCACCCTGGAGGGCAGAAGGCT6GCTTCASCATCATTAGCTCTCCTC 1330 
~*^ | 3' untranslated : 

L . GP6SHCHPGG0KAGFSI ISSP 
TGCCCTCTTCCTTCATAGCTCTGGC7CACCAGCCTCGCC AAGAGAACAGGAf.r,GAAGAAGAGGGCAGGAG :960 

~~ 3' unlranslated . — 

L P S S F I A L A H 0 P a C E N R : £ E l G R a 
GAGGGATGGGTTCTCGGTGCTGAACCTTTGAGAACTTCCTACTAGGAATT f,GAGGGGGTGGAGTTTGAGA :oy. 

3' untranslated . 

PDGFSV-wNL.EtPTRN.-RGvSL? 
AC TCCGTGCCCCTTAAC T-C ATT'GCTGGCCTCCTCTTACGAC TT AGG AGAAAAGATG AT T CTGGTC T T T - ' • 

3' untranslated 

TPCPLTTF AGIILRLPRKDuSG:. 

TC TTCAAGTTTTGTTTCACC TACAAACTC TTCGGC TTTCTGGGGAGGGAT TCGGAAGA TaTAAACAGACA 2 ■ 

3' untranslated _ 



fKFCF rvsLLGFLGROS 



N P 0 



WO 98/24810 162/270 



PCT/EP97/06956 



AACAAAAAC AAACAA ACCAACTAC AGCAG T TCCAACCTCG TTC TC ACAA A CACCTC TGAGAf - AfiTr tr tr 22<(0 

. 3' untranslated 

T K T N * P T 7 A V P S S F S 0 T P L r o S H 

GTGGGCAAArCTAAGGGAGGCAGGAAGCTCTA C AGACTTTCTTGCAAACCCrTCCCAGTTCTr.TrGA rAr -v> J0 
— . 3' untranslated 

V C KS<G GRKLYRLSCKPF PVLST 

TGCCAACAACCTCCCCGCCAoAGACCrGGC CAGAGCCAAGAAAAGAGAAGCArGTGGT7 T A ArAfiflAAA/\ 2 . 80 
. 3' untranslated ~ • 

L P ' T 5 ? P ^ ' V P £ P R K E K H V V . 0 K !V 

CAAAACAAAACAAAACAAA AAATATA TGTG TAAATCAACC TGTAGAAGGTAAAAACGGCAATGG AAAAftA 2q50 
._ 3* untranslated ' 

" T K ° » K < ' M C K S T C R R . K R 0 V K R 

TGAAGCTGGAAGGAGGGGCCCAGTTGCCAAG ATGGAACGAGAGCTGCCAGArCTTGCCTT-T. nAT^rA ,« 20 
. 3' untranslated " " " ■ Cw 

• S V K E G P S C 0 0 G T R a A R S C L L 0 0 " 

AGAGGGGACATTGCAAGArGGCTGCCAGTCTA A AACGTCACCAGACCACAAGAGTAACATCACAGr rrTr .gen 
_ 3' untranslated " " *~ w 

K " 6 H C * M A A S I K R H 0 T T R V r S o p s 

GAAGAAAGGCCACAAGC T GTCTTTC TGCCC TC TAACTGAACATGC ATGaa a AG TCAA r AAACCCTAT T TT - 660 

3' untranslated ~ " 

K K G H K L 5 * c p l r E h a . K v , K p Y F 

TTAATTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAATTTCCGCG GCCGC 2709 
— ,) T PQlyA tail + linker 
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AAGCTTGGCACGAGGCCTCG7GCCAAGCTGAGACCGTCATGCAGCTCCGAAATGAGTTAAGAGACAAGGA 70 



| LINKER? )| open reading frame 


AVHEASCQAETVMOLRNELRDK 


r 


GATGAAGCTGACAGATATCCGCTTAGAAGCTCTCAGTTCTGCCCACCAGCTGGACCAGCTCCGGGAGGC 


: i«o 


open reading frame 


MKLTQ IRLEALSSAHOLOQLREA 




ATGAACAGG ATGC AGAG TGAAA TAG AGAAGCTGAAAGCTGAGAATGATCGGCTGaaliTCaG AG 1 C 1 CAACi 2 10 


open reading frame 


MNRMQSEIEKLKAENORLKSESQ 




GCAGTGGCTGCAGCCGGGCTCCTTCCCAAGTGTCCATCTC TGCCTCCCCGAGGCAG TCCATGGGCC TC "TC 


230 


open reading frame 




GSGCSRAPS0V5ISASPR0SMGL 5 




CCAGCACAGCTTGAACCTCACTGAGTCAACCAGCCTGGACArGTTGCTGGATGACACTGGTGAATGCTCG 


250 


open reading frame 




OHSLN L TE S TSLDMLLDDTGEC S 




GCTCGGAAGGAAGGAGGCAGGCATGTTAAGATAGTTGTCAGCTTTCAGGAGGAAATGAAGTGGAAGGAGG 


420 


open reading frame 




ARKE 3GRHVK I VVSFQEEMKVKE 




ATTCCAGACCACACCTCTTTCTTATTGGCTGCATTGGAGTTAGTGGCAAGACGAAG TGGGATGTGCTCGA 




open reading frame 




OSRPHLh L I GC IGVSGKTKVOVLD 




TGGGGTGGTTAGACGGCTGTTCAAAGAATACATCATTCATGTCGACCCAGTGAGTCAGCTAGGGCTGAAT 


560 


open reading frame 




GVVR3LFKEYI IHVDPVSQLGLN 




TCAGACAGCGTTCTTGGCTACAGCATTGGAGAAATCAAGCGCAGCAACACTTCCGAAACACCGGAGCTGC 


620 


open reading frame 




SOSVLGYS ! GE 1KRSNTSETPEL 




TTCCTTGTGGCTA r CTGGTTGGAGAGAACACGACCATCTCAGTGACTGTGAAAGGGCTCGCAGAAAACAG 


TOO 


open reading frame 




LPCGYLVGENTTISVTVKGLAENS 




CC TGGAC TC AC TGGTGTTTGAGTCC TTGATTCCCAAGCCCATCCTGCAGCGCTACGTCTCCCTCCTGATA 


770 


open reading frame 




losl *'F"esl ipkpiloryvsll : 




GAGCACCGTCGGATCATTCTCTCTGGCCCCAGCGGCACTGGGAAAACCTACCTGGCCAACCGGCTGTC'3 


8*40 


open reading frame 





ehrr: ilsgpsgtgktylanrls 
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— open reading frame " ' — - ■ 13 .si * 0b0 



M P l 



7 I I L 0 N 



L h h 



open readi ng tramo — " 1 I'mc.T .20 



i reading I 
Y H K C P v I I g 



— — ~ A - LINKER-vector — \ 
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WO 98/24810 



165/270 
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10 

1 J 


i0 3,0 40 5f 0 


1 1.- 50 UNC53_hu^ 
5 1- 48 


AAGCTTGGCA CGAGGCC 
A W K E A 

60 


TCC TGCCAACCTG AGACCGTCAT GCAGCTCCGA 
S CQA »TVM 0 I* R 

70 80 IO 100 


1 51-100 jNCS3_hmna 
5 49-. 9-3 


AATGAS77AA GAGACAAGSA GATGAALit- .1* AUU-.AiA^L U_1'1AU**UU 
,.; £ u RDKE MKL TDi RLEA 

110 l« 13,0 ijo IJO 


i 101-150 lS8C33_humd 
5 99-148 


~,- T --':— r- GCCCACCAGC TGGACCAGCT CCGGGAGGCC ATGAACAGGA 
' :. 5*S AHQ L30L R E A KMR 

XiO i]o 'f> M° 3 ?° 


1 151. ..200 UNC53_^uma 
5 149...198 


rcCAGACHCA AA7AGAGAAC CTGAAAGCTG AGAATGATCG GCTGAAGTCA 
M Q S E I E K Z, K A ENOR LKS 

J10 2?5 ^9 J f° 


I 20 1.-250 5 3 _nuira 
5 199-.24B 


C-AG-CTCAAG GCAGTGGCTG CAGCCGC-GC? CC7TCCCAAG TGTCCATCTC 
S S Q GS 3C SRA PSQ VSIS 

a«o 270 if* T T 


1 251.-300 UNC53_^r-L-n<J 
5 249-298 


.„„ ccc ^i AGGCAGTC C A TGGGCCTC7C CCAGCACAGC TTGAACCTCA 
A *5 P KQS MGLS QMS LML 

510 32C 3*0 JS0 


; } 01-350 'JWC53 Jrrcxa 
5 299.-348 


C-CVJTCAAC CAGCCTGGAC ATCT7GCTGG ATGACACTGG TCAATGCTCO 
t'est SLD MLL DDTG ECS 

360 i]0 J«0 3^0 400 


; 351.-iCO L74C5 3 _hux5 
5 349-393 


GCTCGGAAGG AAGGAGGCAG GCA7GTTAAG A7AG7TGTCA GCTTTCAGGA 
ARK E G G R H V K I V V S F Q E 
410 420 U0 4^0 


1 4 C 1.-4 50 UNC52_*vjna 
5 399.-44 3 


GGAAA7GAAG TGGAAGGAGG ATTCCAGACC ACACCTCTTT CTTATTGGCT 
SMK WKE DSR? KLF LIC 

460 «]0 *«« *?° S ? 0 


: 451_*00 UNC53_r.ur-a 
d 449-498 


GCA77GGAGT 7AGTGGCAAG ACGAAG7GGG ATGTGCTCGA 7GGGGTCGTT 
CIGV SGK TKW DVLD GVV 
sio sio sjo s«e s^o 


I EC 1.-5 SO UNCS3_r.^:J 
5 499.-5C8 


AGACCGCTGT TCAAAGAATA CATCA7TCAT GTCCACCCAG TGAGTCAGCT 
R R I. FKEV I I H V D ? </ S Q L 

S60 5-jO »f° 6 1° 


1 551-.600 UNC 5 3 
5 549.-598 


AC&CT3MT TCAGACAGCC- 77C77GGC7A CAGCATTGGA GAAATCAAGC 
G LN SCS VLGY S-v. EIK 

JI0 «]0 «f 


1 cGl-iSC L*N~S?._r. j.r* 
3 599-64* 


LakaacSc ttcccaaaca ccogaoctcc -ccrrcTGG CTAWTUUrr 
|rSMT SST PEL L r C ^ fL V 

660 «7, •?* T 


i •55i..."G3 ;a;cs. , . - K .'^< 

5 649.-696 


i GG AG AG AACA C5ACCATCTC AGTCACTC-TC- AAAGGGCTCG CAUAAACAC 
Igen TTIS VTV KGL aens 



WO 98/24810 1**™ 

166/270 PCT/EP97/06956 



5 699..?48 



j 7 P 720" fjo m 

1-750 ^53J:ur^ : U^ CTOSXQ^ ^um TCCCXAGCcl ATCCTGCAQC" 



3 IDS T V F p r t r '^^"-CC ATCCTGCAGC 

I c,nKR IlLSG? 

; so: :|| c ^^Ap^Jl^^ 

^ i?c 900 



. S51-900 'JKC5J.hu.Tj 
5 849-898 



S60 



T 



~.-TT C*AcW 2«««C^ =CT,ATCGCC ACC^TAAcl 
"° »* C 940 ,j„ 



91? 

_L 



<• - i S " 9 ?° »*» 


5 949...99S 


»»»« 


I 1 001_I C5C fVC* -i v. 
5 999-1048 *" 


- crAcjtus ^ctctct gggccaga^ ttcaatgc^ 

»ssl ce: fng 


5 1C49..IC99 


■•-t~ c -§ •- J ^ GT r^ TTT 7 f A T"r^ 

Mr 0 l V c ", 30 : V S 


L n::-.:isc- l-kcs* 

5 *09?.-.ll4a 


Tt-"5T rr 8 ? 1 ^ c r cAccATji ^ndHish 


I 11S1.U2CC -JSC 5 3 r.J 
3 I149..1I98 


C-.-C..-XUT gccaaccac, u^ccwj CAACGGrr^ cpcccc^ 

: v° ii , 40 L ° *»« 


3 1139*1245 


••- t; =..GGAG SAAGCItJlIO GAAACAGA^A TCAGTGGGC3 GGT^CGCAAT - 
X_x ETS I 3 G R VR H 

1»0 : 30 o 


* :249..12r6 j 
I UC1...13 5C- Ji 


..A,. „u -AAAAATCAT TCACTCCA7T CCCAAGGTCT GGCATCACC^ 
VXIx DW: PKV WHHL 

"r : V C ", so 


^ 12?9...ij40 


'n"T T f~ c : k:aG3 C" c acagttcctc CGACGTCACC atcobccccc 

" c A HSSS DVT ICP 
,3 . 40 iJ , 7 ° n^c li,0 , 40C 


i - - l-I. 4 CO LWC3 3 h . " 
5 1349..1JSS. " 3 


^..v.^. ...CA^CCCC ATCGA7CTGG ACCGCTCCaA AG7GTGGTTC~ 
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y RACE (placenta) 
— polyA EST 923793 



5* RACE (placenta) cl 1 .4 
5' RACE (placenta) cl 3.7 



GTCAATGAG 

PCR (HeLa) EST 485068- hh2 E 2.3 

— PCR (placenta) EST 485068- hh2 C 2,3 

PCR (HeLa) EST01222-hh2 E 1.3-3 

— 5' RACE (placenta) B2.1 

— 5" RACE (HeLa) D2.1 

— — 5' RACE (adenocarcinoma SW480) H2.1 
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5* RACE (adenocarcinoma S W480) D2. 1 -5 
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+1GTRVIT IG PRL FLSC PID VDG SRVW FTD 
1 GGCACGAGGG TTpiCCATCGG CCCCCGGCTC TTCCTGTCAT GCCCCATCGA TGTGGACGGC TCGAGAGTGT GGTTCACCGA 
CCGTGCTCCC AATGGTAGCC GGGGGCCGAG AAGGACAGTA CGGGGTAGCT ACACCTGCCG AGCTCTCACA CCAAGTGGCT 



+ 1 LWN YSII P Y L LEA VREG LQL YGR RAP 
81 CTTGTGGAAC TATTCCATTA TCCCCTATCT CCTGGAAGCC GTCAGAGAAG GACTCCAGCT CTATGGAAGG CGCGCCCCCT 
GAACACCTTG ATAAGGTAAT AGGGGATAGA GGACCTTCGG CAGTCTCTTC CTGAGGTCGA GATACCTTCC GCGCGGGGGA 



+ 1WEDP AKW VMDT YPW A A S PQQH EWP PLL 

NCOl 



161 GGGAGGATCC TGCCAAGTGG GTGATGGACA CATATCCATG GGCAGCCAGC CCACAACAGC ACGAGTGGCC TCCCCTGCTG 
CCCTCCTAGG ACGGTTCACC CACTACCTGT GTATAGGTAC CCGTCGGTCG GGTGTTGTCG TGCTCACCGG AGGGGACGAC 



+1QLRP EDV GFD GYSM PRE GST SKQM PPS 
241 CAGTTACGGC CTGAGGATGT CGGCTTCGAC GGCTACTCCA TGCCTCGGGA GGGATCGACA AGCAAGCAGA TGCCCCCCAG 
GTCAATGCCG GACTCCTACA GCCGAAGCTG CCGATGAGGT ACGGAGCCCT CCCTAGCTGT TCGTTCGTCT ACGGGGGGTC 



+ 1 DAE GDPL MNM LMR LQEA ANY SSP QSY 
321 TGATGCTGAA GGTGACCCGC TGATGAACAT GCTGATGAGG CTGCAGGAGG CAGCCAACTA CTCCAGCCCC CAGAGCTATG 
ACTACGACTT CCACTGGGCG ACTACTTGTA CGACTACTCC GACGTCCTCC GTCGGTTGAT GAGGTCGGGG GTCTCGATAC 



+ 1DSDS N S N SHHE D I L DSS LEST L * Q GPG 
401 ACAGCGACTC CAACAGCAAC AGCCATCACG AAGACATCTT GGACTCCTCT TTGGAGTCCA CTCTGTGACA GGGGCCCGGA 
TGTCGCTGAG GTTGTCGTTG TCGGTAGTGC TTCTGTAGAA CCTGAGGAGA AACCTCAGGT GAGACACTGT CCCCGGGCCT 



+1AQRP PLL LTA FHLH PPH HPE DDFL SQP 
481 GCCCAGCGCC CTCCTCTTCT CCTCACCGCA TTCCACCTGC ATCCCCCACA TCACCCTGAA GATGACTTCC TGAGCCAGCC 
CGGGTCGCGG GAGGAGAAGA GGAGTGGCGT AAGGTGGACG TAGGGGGTGT AGTGGGACTT CTACTGAAGG ACTCGGTCGG 



+1 PAT ALEL REH RDP PSFS LDL GAG IPG 
561 CCCAGCCACA GCCTTAGAGC TGCGGGAACA CCGAGACCCC CCGTCCTTCA GCCTCGACCT GGGTGCAGGC ATCCCGGGCC 
GGGTCGGTGT CGGAATCTCG ACGCCCTTGT GGCTCTGGGG GGCAGGAAGT CGGAGCTGGA CCCACGTCCG TAGGGCCCGG 



+1QLPA DRF LPQR ELH YLL LYFN YCF ALL 
641 AGCTGCCTGC GGACCGCTTC CTTCCACAGC GAGAACTGCA CTACCTTCTG TTGTACTTTA ATTATTGTTT TGCCTTGTTG 
TCGACGGACG CCTGGCGAAG GAAGGTGTCG CTCTTGACGT GATGGAAGAC AACATGAAAT TAATAACAAA ACGGAACAAC 



+ 1L*PP * D T EDT SRER IIA V E M KKKK KKK 
721 CTGTGACCTC CCTAAGACAC TGAAGATACT TCTCGGGAAA GGATCATCGC CGTTGAAATG AAAAAAAAAA AAAAAAAAAA 
GACACTGGAG GGATTCTGTG ACTTCTATGA AGAGCCCTTT CCTAGTAGCG GCAACTTTAC 



+ 1 KKK 
801 AAAAAAAAAA 




N E G G R K L 
lCG AAGGCGGCCG CAAGCTT 
'TTGC TTCCGCCGGC GTTCGAA 
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+ 1IPLT I G L ERP PRQVlPHL DRN TLPK KGL 
1 ATCCCACTCA CTATAGGGCT CGAGCGGCCG CCCAGGCAGG TCfcCGCACCT TGATAGGAAC ACTTTGCCTA AGAAAGGACT 
TAGGGTGAGT GATATCCCGA GCTCGCCGGC GGGTCCGTCC AGGGCGTGGA ACTATCCTTG TGAAACGGAT TCTTTCCTGA 



+ 1 RYT PTSQ LRT QED AKEW LRS H S A GGL 
81 CAGGTATACT CCCACCTCCC AGCTTCGCAC GCAAGAAGAT GCAAAAGAAT GGTTACGGTC CCATTCTGCA GGAGGCCTTC 
GTCCATATGA GGGTGGAGGG TCGAAGCGTG CGTTCTTCTA CGTTTTCTTA CCAATGCCAG GGTAAGACGT CCTCCGGAAG 



+ 1QDTA ANS PFSS GSS VTS PSGT R F N F S Q 
161 AGGACACCGC TGCCAATTCC CCCTTTTCCT CTGGCTCCAG CGTGACTTCT CCCTCCGGAA CAAGATTCAA CTTTTCCCAG 
TCCTGTGGCG ACGGTTAAGG GGGAAAAGGA GACCGAGGTC GCACTGAAGA GGGAGGCCTT GTTCTAAGTT GAAAAGGGTC 



+1LASP TTV TQM SLSN PTM LRT HSLS NAD 
241 CTTGCGAGTC CCACCACTGT CACCCAGATG AGCTTGTCCA ACCCGACCAT GCTGAGGACT CACAGCCTCT CCAATGCTGA 
GAACGCTCAG GGTGGTGACA GTGGGTCTAC TCGAACAGGT TGGGCTGGTA CGACTCCTGA GTGTCGGAGA GGTTACGACT 



+ 1 GQY DPYT DSR FRN SSMS LDE KSR TMS 
321 TGGGCAGTAT GATCCATACA CTGACAGCCG CTTCCGGAAT AGCTCCATGT CCCTGGATGA GAAGAGCAGA ACCATGAGCC 
ACCCGTCATA CTAGGTATGT GACTGTCGGC GAAGGCCTTA TCGAGGTACA GGGACCTACT CTTCTCGTCT TGGTACTCGG 



+1RSGS FRD GFEE VHG SSL SLVS STL SVY 
401 GTTCAGGCTC ATTCCGGGAT GGGTTTGAAG AAGTTCATGG ATCCTCACTC TCCCTGGTTT CCAGCACATT GTCAGTTTAT 
CAAGTCCGAG TAAGGCCCTA CCCAAACTTC TTCAAGTACC TAGGAGTGAG AGGGACCAAA GGTCGTGTAA CAGTCAAATA 



+1STPE EKC QSE IRKL RRE LDA SQEK VSA 
481 TCTACACCAG AAGAAAAATG CCAGTCAGAG ATTCGCAAGC TGCGGCGGGA ACTGGATGCC TCCCAGGAGA AAGTTTCAGC 
AGATGTGGTC TTCTTTTTAC GGTCAGTCTC TAAGCGTTCG ACGCCGCCCT TGACCTACGG AGGGTCCTCT TTCAAAGTCG 



+ 1 LTT QLTA NAH LVA AFEQ SLG NMT IRL 
561 TTTGACCACC CAGCTGACAG CAAATGCTCA CCTTGTGGCT GCCTTTGAAC AGAGTCTTGG TAACATGACA ATCAGGCTCC 
AAACTGGTGG GTCGACTGTC GTTTACGAGT GGAACACCGA CGGAAACTTG TCTCAGAACC ATTGTACTGT TAGTCCGAGG 



+ 1QSLT MTA EQKD SEL NEL RKTI ELL KKQ 
641 AGAGTCTGAC CATGACAGCT GAGCAGAAGG ATTCAGAACT GAATGAGTTA AGAAAAACCA TTGAGCTGCT AAAGAAACAG 
TCTCAGACTG GTACTGTCGA CTCGTCTTCC TAAGTCTTGA CTTACTCAAT TCTTTTTGGT AACTCGACGA TTTCTTTGTC 



+ 1NAAA Q A A ING VINT PEL NCK GNGT AQS 
721 AACGCAGCTG CCCAGGCTGC CATTAATGGA GTAATTAACA CACCTGAGCT CAACTGCAAA GGAAACGGCA CTGCCCAGTC 
TTGCGTCGAC GGGTCCGACG GTAATTACCT CATTAATTGT GTGGACTCGA GTTGACGTTT CCTTTGCCGT GACGGGTCAG 



+ 1 ADL RIRR QHS SDS VSSI NSA TSH SSV 
801 TGCAGACCTC CGCATCCGCA GGCAGCACTC CTCAGACAGC GTCTCCAGCA TCAACAGTGC CACCAGCCAC TCCAGTGTGG 
ACGTCTGGAG GCGTAGGCGT CCGTCGTGAG GAGTCTGTCG CAGAGGTCGT AGTTGTCACG GTGGTCGGTG AGGTCACACC 



+ 1GSNI ESD SKKK KRK NWL RSSF KQA FGK 
881 GCAGCAACAT AGAGAGTGAC TCAAAGAAGA AGAAGAGGAA GAACTGGTTA CGCAGCTCCT TCAAGCAAGC TTTCGGGAAG 
CGTCGTTGTA TCTCTCACTG AGTTTCTTCT TCTTCTCCTT CTTGACCAAT GCGTCGAGGA AGTTCGTTCG AAAGCCCTTC 



+ 1 K K S P KSA SSH SDIE ETT DSS LPSS PKL 
961 AAGAAGTCCC CAAAATCTGC GTCCTCTCAT TCAGATATTG AGGAGACGAC GGATTCTTCT TTGCCTTCCT CACCAAAGTT 
TTCTTCAGGG GTTTTAGACG CAGGAGAGTA AGTCTATAAC TCCTCTGCTG CCTAAGAAGA AACGGAAGGA GTGGTTTCAA 



+ 1 PHN GSTG STP LLR NSHS NSL ISE CMD 
1041 ACCGCACAAT GGGTCCACAG GTTCCACCCC ACTGCTGAGG AATTCTCACT CCAACTCTCT AATTTCCGAA TGCATGGATA 
TGGCGTGTT A CCCAGGTGTC CAAGGTGGGG TGACGACTCC TTAAGAGTGA GGTTGAGAGA TTAAAGGCTT ACGTACCTAT 



+ 1SEAE TVM QLRN ELR DKE MKLT DIR 
1121 GTGAAGCTGA GACCGTCATG CAGCTCCGAA ATGAGTTAAG AGACAAGGAG ATGAAGCTGA CGGATATCCG 

CACTTCGACT CTGGCAGTAC GTCGAGGCTT TACTCAATTC TCTGTTCCTC TACTTCGACT GCCTATAGGC G 
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* 2 EFE LGTL TIG LER P P G Q V I R D GFE EVH 
1 CGAATTCGAG CTCGGTACAC TCACTATAGG GCTCGAGCGG CCGCCCGGGC AGGTuCGGGA TGGGTTTGAA GAAGTTCATG 
GCTTAAGCTC GAGCCATGTG AGTGATATCC CGAGCTCGCC GGCGGGCCCG TCCAdbcCCT ACCCAAACTT C^cIaG^A^ 

+2 G S S L SLV SSTS SVY STP EEKC O S E T R K 
S^T^ CTCCTTGGTT TCCAGCACAT CGTCAGTTTA TTCTACACCA GAAGAAAAAT GCCAGTCAGA GATTCGCAAG 
CTAGGAGTGA GAGGAACCAA AGGTCGTGTA GCAGTCAAAT aagatctggt cttcttttta cggtcagtct CTAAG^G^ 

+2LRRE LDA SQE KVSA LTT QLT A N A H T V * 
1 STSSS 00000 AACTGGATGC CTCCCAGGAG AAAGTTTCAG CTTTGACCAC CCAGCTGACA GCAAATGCTC ACCTTGTGGC 
_ GACGCCGCCC TTGACCTACG GAGGGTCCTC TTTCAAAGTC GAAACTGGTG GGTCGACTGT CGTTTACGaS ^aISc^ 

+2 AFE QSLG NMT IRL QSLT MTA EOK DSp' 
241 AGCCTTTGAA CAGAGTCTTG GTAACATGAC AATCAGGCTC CAGAGTCTGA CCATGACAGC TGAGCAGAAG GACTCAGAAC 
TCGGAAACTT GTCTCAGAAC CATTGTACTG TTAGTCCGAG GTCTCAGACT GGTACTGTCG ACTCGTCTTC CTGAGTCTTG 

+2LNEL RKT IELL KKQ N A A A Q A A ING VTM 
321 TGAATGAGTT AAGAAAAACC ATTGAGCTGC TAAAGAAACA GAACGCAGCT GCCCAGGCTG CCATTAATGG AGTAATTAAC 
ACTTACTCAA TTCTTTTTGG TAACTCGACG ATTTCTTTGT CTTGCGTCGA CGGGTCCGAC GGTAATTACC ^£11™ 

+2TPEL NCK GNG TAQS ADL R I R ROHS SDS 
401 ACACCTGAGC TCAACTGCAA AGGAAACGGC ACTGCCCAGT CTGCAGACCT CCGCATCCGC AGGCAGCACT CCTCAGACAG 
TGTGGACTCG AGTTGACGTT TCCTTTGCCG TGACGGGTCA GACGTCTGGA GGCGTAGGCG TCCGTCGTGA GGAGTCTGTC 

+ 2 VSS INSA TSH SSV GSNI ESD SKK KKR 
481 SESSS ^TS^ GTG CCACCAGCCA CTCCAGCGTG GGCAGCAACA TAGAGAGTGA CTCAAAGAAG AAGAAGCGGA 
GCAGAGGTCG TAGTTGTCAC GGTGGTCGGT GAGGTCGCAC CCGTCGTTGT ATCTCTCACT GAGTTTCTTC TTCTTCGCCT 

+2KNWV NEL RSSF KQA FGK KKSP KSA S S H 
561 i^T^T CAATGAGTTA CGCAGCTCCT TCAAGCAAGC TTTCGGGAAG AAGAAGTCCC CAAAATCTGC GTCCTCTCAT 
TCTTGACCCA GTTACTCAAT GCGTCGAGGA AGTTCGTTCG AAAGCCCTTC TTCTTCAGGG GTTTTAGACG CAGGAGAGTA 

♦2 SDIE EMT D S S LPSS PKL PHN GSTG STP 
641 7^?*™ AGGAGATGAC GGATTCTTCT TTGCCTTCCT CACCAAAGTT ACCGCACAAT GGGTCCACAG GTTCCACCCC 
AGTCTATAAC TCCTCTACTG CCTAAGAAGA AACGGAAGGA GTGGTTTCAA TGGCGTGTTA CCCAGGTGTC CAAGGTGGGG 

+2 LLR NSHS MSL ISE CMOS EAE TVM OLR 
721 ^TS^? 5 AATTCTCACT CCAACTCTCT AATTTCAGAA TGCATGGATA GTGAAGCTGA GACCGTCATG CAGCTCCGAA 
TGACGACTCC TTAAGAGTGA GGTTGAGAGA TTAAAGTCTT ACGTACCTAT CACTTCGACT CTGGCAGTAC GTCGAGGCTT 



+2NELR DKE MKLT 
801 ATGAGTTAAG AGACAAGGAG ATGAAGCTGA CGGAT^T 
TACTCAATTC TCTGTTCCTC TACTTCGACT GCCTATA 



3ATOT 
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ACCESSION 

NID 

KEYWORDS 
SOURCE 
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REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



mRNA 



AA418158 610 bp mRNA EST 19 may TQ07 

ZVltlle* 1 S ° areS NhHMPU 31 HOm ° 9apienfl CDNA Cl ° ne ^35 5 . 

g2079968 

EST. 

human. 

Homo sapiens 

Eukaryotae; mitochondrial eukaryotea; Metazoa; Chordata; 

HoiS «— Eutheria; Primates; Catarrhini; Hominidae; 

1 (bases 1 to 610) 

Hillier,L., Allen,M., Bowles, L* , Dubuque, T., Geisel,G., Jost S 
Moo^V'^TT;"'; LG ' N " L€ ™ G '' Marra^., Martin J. ' 
wh£*'v*' I 0 *! 11 ?****'*-' Ste P toe,M., Tan,F., Theisin^B , 
White, Y., Wylie,T., Waterston,R. and Wilson, R. 
WashU-Merck EST Project 1997 
Unpublished (1997) 

Contact: Wilson RK 
WashU-Merck EST Project 

Washington University School of Medicine 

is: \rr™?L** x ^' Box 8501 ' st - ^ «° ^ 

Fax: 314 286 1810 

Email: est8watson.wustl.edu 

TM^ C i° ne is . available royalty-free through LLNL ; contact the 
IMAGE Consortium (infoeimage.llnl.gov) for further information 
Seq primer: -28ml3 rev2 ET from Amersham 
High quality sequence stop: 492. 

Location/Qualifiers 

1. .610 

/organisitF^Homo sapiens" 

/note=»0rgan: mixed (see below); Vector: P T7T3D-Pac 
(Pharmacia) with a modified polylinker; Site_l : Not I- 
™"V EqUal t™ 0 ™* 3 of Plasmid DNA from three 

R ^ llZe paries (melanocyte 2NbHM, pregnant uterus 
NbHPU, and fetal heart NbHH19W) were mixed, and ss circles 
were made in vitro. Following HAP purification, this DNA 
was used as tracer in a subtr active hybridization 

Tnnn 10 ?* PCR -*™Plif ied cDNAs from pools of 

5,000 clones made from the same 3 libraries. The pools 

consisted of I.M.A.G.E. clones 260232-265223 

340488-345479, and 484488-489479." 

/clone=*767735 H 

/clone_lib=" Scares NhHMPu SI" 

/tissue_type="Pooled human melanocyte, fetal heart, and 
pregnant uterus" 
/ lab_hos t= " DH 1 OB " 
<1. .>610 
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/clones" 5D16" 

/clone_lib«"Zebraf ish ICRFzfla" 
/aex= "mixed" 

/tissue_type- "pooled 26-aomite embryos" 

/lab_host="XLl-blue MRF" 
mRNA complement (<1 . ,>418) 

BASE COUNT 108 a 87 c 78 g 145 t 

ORIGIN 

1 tttacatttt ttgaggaaga tgctaatggt ctattctgat tcaatgattt 
61 aagctaaaat gctcctgtca aatcctgaga tcagctgaat gaattaaaaa 
121 ctcaactgtc taactctagg ggagttgtaa aatgggccta tttccctaaa 
181 actttaagag catgatggtc caccagtttc actgtctaaa ttttgttatt 
241 atcttctctg ggcattttga cgattttaac actaacctgt gggtaatctg 
301 aaactggaca tggtttcttc cagattctgt ctcagatcag caatgttctt 
361 atccgtctag tttctggatc ttctcctgag atctcctcca ggcactgttt 



atgctaagct 
tttggtaaaa 
aagtaatgtt 
ccataagcta 
cgtcccccgt 
cactgtacgc 
ggcggtct 



// 

—v. I * k A Ot;n/l *) I n n AQCCXA O fan^fflfi <s1 ^.oKrnf -i ah Tf T RF7f 1 a nanin rori r\ rrHNtt 
y^,*^--^-^-.-. 

clone 5D16 3' 
Length = 418 

Minus Strand HSPs: 

Score - 195 (87.9 bits), Expect = 9.9e-18, P = 9.9e-18 
Identities = 37/46 (80%), Positives « 42/46 (91%), Frame - -3 

Query: 627 TGQPALEELTGEDPEiUlRLRTVKNIADLRQNIiEETMSSLRGTQVTH 672 

T + LEE++GEDPE RR+RTVKNIADLRQNLEETMSSLRGTQ+TH 
Sbjct: 416 TAKQCLEEISGEDPETRRMRTVKNIADLRQULEETMSSLRGTQITH 279 

MOUSE 2 
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AA208994 527 bp mRNA EST 18-FEB-1997 

mw75el2.rl Soares mouse NML Mus musculus cDNA clone 676558 5*. 

AA208994 

gl807004 

EST . 

house mouse. 
Mus musculus 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; 
Mus . 

1 (bases 1 to 527) 

Marra,M., Hillier,L., Allen, M., Bowles, M., Dietrich, N . , Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le , M . , Martin, J., Morris, M., 
Schellenberg,K. , Steptoe,M-, Tan,F., Underwood, K . , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares, B . , Wilson, R. and 
Waterston,R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 



BNSDOCID: <WO 982481 0A2J_> 
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Tel: 314 286 1800 
Fax: 314 286 1810 
Email: mouaeestewat3on.wu3tl.edu 

MGI: 416262 ".gov; ror further information. 

Putative full length read 
vector to vector length is 535 
Seq primer: -28ml3 rev2 ET from Amersham 
FEATURES ^alxty sequence stop: 478. 

FEATURES Location/Qualifiers 
source 1..527 

/organism^ "Mus musculus 



pc5S:?°slte T I T3 rr= < Ph — with a modi£ied 
^uxyxxniter, Site_l : Not I; Site 2: Eco pt- iof *. 

was prime d with a Hot I - 'oU,o?di) primer t V 

?Ph™ t^f CDNA Wa3 li9ated to Ec ° adaptors ] ' 

(Pharmacia), digested with Not t -^ ^ . ^ 

and eco hi sites" of thTSdSLJ OT^S^tiSLT * 

sssr* and nomaii2ed by Bent ° i^s. 



/clone^^esSB" 
/clone_lib="Soares mouse NML" 
/tissue_type= "Liver" 
/lab_hos t= " DH 1 OB " 
mRNA <1..>527 



« :snss: ssxs ssg- g a tgggtt t g 

121 cagaagaaaa atgccaotca alll+t tttccagcac atcctccatc taotcoacgo 

181 aaaa g 9 tgtc t^ctaact ItlTatllT T** 3 ^ g cctcccag g 
241 agcagagtct ggglaa"catg accat^f f* caaat * c tcaccttgtg gcagccttcg 
301 aggattcaga actg^cgal ttaagaalaa coatf^ aaCtat * acc 9ctgagcag 9 
361 ctgoocaggc tgccattaat 111? CCat ° ga 9 ct ^tgaagaaa cagaatgcag 
421 gcagtgccag gctac^gacc taogcatcco aCaCgCCa * a ^tcaactgc aaaggaaatg 
481 atcaatagcg Lc^ c \%~ ~£ =~ g tgtctccagt 

^'^'-l™^..! Soa res , M e NML M « a mU scul U3 cDNA 

Length = 527 
Plus Strand HSPs : 

110/143 (76%), Positives = 114/143 (79%), Frame = +3 
0«.ry. 1», KVSALTTQLTANftHLVftAFEQSLGNMTIBLQSLTMTAEQKDSELHELRKTIEXXXXXXXK 1630 



BNSDOCID:<WO_9824810A2_I > 



_ PCT/EP97/06956 
WO 98/24810 191/270 



KVSALTTQLTANAHLVAAFEQSLGNMTIRLQSLTMTAEQKDSELNELRKTIE 
Sbjct: 183 KVSALTTQLTANAHLVAAFEQSLGNMTIRLQSLTMTAEQKDSELNELRKTIELLKKQNAA 362 

Query: 1631 XXXXXXGVTNTPELNCKGNGTAQ 1653 

GVTNTPELNCKGNG+A+ 
Sbjct: 363 AQAAINGVINTPELNCKGNGSAR 431 

Score m H6 (52.3 bits), Expect « 2.3e-76, Sum P(2) = 2.3e-76 
Identities = 24/25 (96%), Positives = 25/25 (100%), Frame - +1 

Query: 1661 RQHSSDSVSSINSATSHSSVGSNIE 1685 

+QHSSDSVSSINSATSHSSVGSNIE 
Sbjct: 451 QQHSSDSVSSINSATSHSSVGSNIE 525 



LOCUS 

DEFINITION 

ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



AA049124 337 bp mRNA EST 09-SEP-1996 

mj45f 04 . rl Scares mouse embryo NbMEl3.5 14.5 Mua musculus cDNA 

clone 479167 5' . 

AA049124 

gl528794 

EST. 

house mouse . 
Mus musculus 

Eukaryotae; mitochondrial eukaryotea; Metazoa; Chordata; 
Vertebrata; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; 
Mus. 

1 (bases 1 to 337) 

Marra,M., Hillier,L., Allen, M. , Bowles, M. / Dietrich, N., Dubuque, T., 
Geisel,S., Kucaba,T., Lacy,M., Le , M . , Martin, J., Morris, M., 
Schellenberg,K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T. f Lennon,G., Soares,B. f Wilson, R. and 
Waters ton, R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty- free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI :289911 

Seq primer: -28M13 rev2 from Amersham 
High quality sequence stop: 292. 
Location/Qualifiers 
1. .337 

/organism 55 " Mus musculus" 
/strain="C57BL/6J" 

/note= "Vector: pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l : Not I; Site_2 : Eco RI; 



1st strand cDNA 



was primed with a Not I - oligo(dT) primer [5* 



WO 98/24810 192/270 PC17EP97/06956 



mRNA 
BASE COUNT 
ORIGIN 

1 



80 



TGTTACCAATCTGAAGTGGGAGCGGCCGCGGAAATTTTTTTTTTTTTTTTTTTTTTTT 
T 3'], on equal amounts of mRNA from 2 13.5dpc and 2 
X4.5dpc embryos [total RNA provided by Minoru Ko, Wayne 
State Univ., from 2 ]; double-stranded cDNA was ligated to 
Eco RI adaptors (Pharmacia), digested with Not I and 
cloned into the Not I and Eco RI sites of the modified 
PT7T3 vector. Library went through one round of 
normalization, and was constructed by Bento Scares and 
M.Fatima Bonaldo." 
/clones" 479167 " 

/clone_lib= M Soares mouse embryo NbME13.5 14.5" 
/ sex= " unknown " 
/tissue_type= "embryo" 

/dev_stage=»13.5-14.5dpc total fetus" 



/lab__host= " DH 1 OB " 
<1. .>337 
a 101 c 97 g 



59 t 



fii 9 gggcaccgag gtcaccgaga cccctgctca ttcagtcccc cacactagac 

61 tgttccaagc caatgaagag gaggagccag agaagaagga ggtatcagaa ctgcgctctg 
121 aactatggga aaaagagatg aagctcacgg atatccggtt ggaggccctc aactctgccc 
181 accagctgga ccagcttcgg gagaccatgc acaatatgca gttggaggtg gacctgctga 
241 aagcagagaa tgaccggctg aaggttgccc ccgggccctc ctca^gctgc actcclgggc 
301 aggtccctgg gtcatcggct ctgtcgtccc ctcgacg ccagggc 



gb|AA049l24|AA049124 mj46f04.rl Scares mouse embryo NbME13.5 14.5 Mus 

musculua cDNA clone 479167 5 ( 

Length = 337 

Plus Strand HSPs: 



Score = 206 (92.9 bits), Expect - 3.9e-19, P = 3.9e-19 
Identities = 42/60 (70%), Positives = 51/60 (85%), Frame = +3 

Query: 1760 DSEAETVMQIJINELRDKE^ 1819 
ou . ^ rt +LR+EL +KEMKLTDIRLEAL+SAHQLDQLRE M+ MQ E++ LKAENDRLK 

Sb 3 Ct: 84 EPEKKEVS^SELWEKEMKLTDXRI^AXNSAHQLDar^TMHHMQrlvDI^™ 263 



LOCUS 

DEFINITION 

ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 



AA185349 348 bp mRNA EST 07-JAN-1997 

642916 3 5 r ! SOarSS lymPh ^ NbMLN MUS musculus C DNA clone 

AA185349 

g!769059 

EST. 

house mouse. 
Mus musculus 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; 

1 (bases 1 to 348) 



BNSDOCID: <WO. 



> 982481 0A2_I_> 



WO 98/24810 193/270 PCT7EP97/06956 



AUTHORS 



TITLE 
JOURNAL 
COMMENT 



Marra,M., Hillier,L., Allen, M . , Bowles, M., Dietrich, N., Dubuque, T. 
Geisel,S., Kucaba,T., Lacy,M., Le , M . , Martin, J . , Morris, M. , 
Schellenberg,K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Scares, B., Wilson, R. and 
Waterston,R. 

The WashU-HHMI Mouse EST Project 
Unpublished (1996) 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 9501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email : mouseest@watson.wustl.edu 



Th i <q r»l nne "i r «va i 1 nhl i=» rnval t*v-f rpp thronrfh T.T.MT. 



FEATURES 

source 



IMAGE Consortium (info6image.llnl.gov) for further information. 
MGI: 394908 

Seg primer: -28M13 rev 2 from Amersham 
High quality sequence stop: 336. 
Location/Qualifiers 
1..348 

/organism="Mus mus cuius" 
/ s trains "C57BL/6J " 

/note= "vector: pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l : Not I; Site_2: Eco RI; [5* 
TGTTACCAATCTGAAGTGGGAGCGGCCGCGATACTTTTTTTTTTTTTTTTTTTTTTTT 
3 * ] ; double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Not I and cloned into the Not 
I and Eco RI sites of the modified pT7T3 vector. RNA 
provided by Dr. Bertrand Jordan. Library constructed and 
normalized by Bento Soares and M.Fatima Bonaldo. " 
/clone="642916" 

/clone_lib ss "Soares mouse lymph node NbMLN " 

/sex- "male " 

/dev_stage="4 weeks" 

/lab_host="DH10B" 
mRNA <1 . .>348 

BASE COUNT 93 a 95 c 78 g 82 t 

ORIGIN 

1 attcggcact gaggggatga ataatccacc aaattagtgt gtacatagga gttgctgggc 
61 ccccccccac tcttatctgc tgtagctagc ctctccctaa gcctcgcatc ttctctaaat 
121 ctatctctgc gttcttacca cttgttctgg ccaatagaac tccggatcaa gaggcagaat 
181 tcctcagata gcatctccag cctcaacagc atcaccagcc attccagcat cggcagcagc 
241 aaagatgctg atgccaagaa gaaaaagaag aagagttggg taagtaaagg cttggagata 
301 ggcctgtgct aggagtcact caccctgttg cagggaactg accccttt 

// 



gb|AAl85349|AA185349 mu51c03.rl Soares mouse lymph node NbMLN Mua 
musculus cDNA clone 642916 5* 
Length = 348 



BNSDOCID: <WO 9824810A2J_> 



WO 98/24810 194/270 PCTYEP97/06956 



Plus Strand BSPa: 



Score =154 (69.4 bita). Expect = 4.4e-12, P - 4.4e-12 
Identities = 27/42 (64%), Positives = 40/42 (95%), Fran* - +1 

Query: 1656 DLRIRRQHSSDSVSSINSATSHSSVGSNIESDSKKKKRKNWL 1697 

+LRI+RQ+SSDS+SS+NS TSHSS+GS+ ++D+KKKK+K+W+ 
Sb 3 ot: 157 ELRIKRQNSSDSISSLNSITSHSSIGSSKDADAKKKKKKSWV 282 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 195/270 PCT/EP97/06956 



PI 6 (/RE M 



"SIM output with parameters: 
substitution scores in BLOSUM62 
O = 1 2 , E = 4" 

Sequence 1: hul, 1702 residues 

Sequence 2: hu2, 2350 residues 

List of local alignments with score >= 100.0 

46.8% identity in 172 6 residues overlap; Score: 2538.0; Gap frequency: 9.3% 

hul , 78 DPESQRKRTVQNVLDLRQNIiEETMSSI»RGSQVTHSSLEMTCYDS — DDANPRSVSSLSNR 

hu2, 639 DPE ARRLRTVKN I ADLRQNLEE TMS SIjRGTQVTH STLETTFDTNVTTEMSGRS ILSLTGR 

*** * -kick * *************** ***** ** * ** ** *. 

hul , 136 SSPLSWRYGQSSPRLQAGDAPSVGGSCRSEGTPAWYMHGERAHYSHTMPMRSPSKLSHIS 

hu2 , 699 PTPLSWRLGQSSPRLQAGDAPSMGNGYPPRANASRFINTESGRYVYSAPLRRQIASRGSS 

***** ************** * * * * * * 

hul, 196 RLEL-VESLDSDEVDLKS GYMSDSDLMGKTMTEDDDITTG 

hu2 , 759 VCHVDVSDKAGDEMDLEGISMDAPGYMSDGDVLSKNI-RTDDITSGYMTDGGLGLYTRRL 

* ** ** ***** * * **** * 

hul, 235 WDESSSISSGLSDASDNLSSEEFNASSSLKSLP 

hu2, 818 NRLPDGMAWRETLQRNTSLGLGDADSWDDSSSVSSGISDTIDNLSTDDINTSSSISSYA 

** *** *** ** **** * •** * 

hul , 268 STPTASRRNSTIVIJRTDSEKRSLAESGIjSWFSESEEKAPKKLEYDSGSIiKMEPGTSKWRR 

hu2 , 878 NTPAS SRKNLDV — QTDAEKHSQVERNSLWSGDDVKKSDGGS — DSG-IKMEPG-SKWRR 

***** ****** * * *** ***** ***** 

hul , 328 ERPE SCDDSSKGGELKKPI SLGBPGSLKKGKTPPVAVTSPITHTAQ — SALKVAGK P 

hu2 , 932 NPSDVSDESDKSTSGKKNPVISQTGSWRRGMTAQVGITMPRTKPSAPAGALKTPGTGKTD 

* * * *★ ** ****** *** * 

hul, 383 EGKATDKGKLAVKNTGLQRSSSDAGRDRIiSDAKKPPSGIARPSTSG — SFGYKKPP-PAT 

hu2 , 992 DAKVSEKGRLSPKASQVKRSPSDAGRSSGDESKKPLPSSSRTPTANANSFGFKKQSGSAA 

* #* * * ** ***** *** * * *** ** + 

hul , 440 GTATVMQTG GSATLSKIQKSSGIPVKPVNGRKTSLDVSNSAEPGFLAFGARSNIQ 

hu2 , 1052 GLAMITASGVTVTSRSATLGKIPKSSAL-VSRSAGRKSSMDGAQNQDDGYLALSSRTNLQ 

* * * **** ** *** * *** * * * *★ * * * 

hul , 495 YRSLPRPAKSSSMSVTGGRGGPRPVSSSIDPSLLSTKQGGLTPSRLKEPTKVASGRTTPA 

hu2 , 1111 YRSLPRPSKSNSRNGAGNRSS— — TSSID-SNISSKSAGLPVPKLREFSKTALGSSLPG 

******* ** * * * **** * * * ** ****** * 

hul , 555 PVNQTDREKEKAKAKAVALDSDNISLKSIGSPESTPKNQASHPTATKLAELPPTPLRATA 

hu2, 1166 LVNQTDKEKGISSDNESVASCNSVKVNPAAQPVSSPAQTSLQPGAKYPDVASPTLRRLFG 

***** ** * * * * * ** * 



BNSDOCID: <WO 982481 0A2_I_> 



WO 98/24810 196/270 PCT/EP97/069S6 



hu2 unc -QLQSQEETKERRHSHTIGGLPESDDQSEIiPSPPALPMSLSAKGQL 

hu2, 14 06 HNTLP^GI^YTPTSQLRTQEDAKEWLRSHSAGGLQDTAANSPFSSGSSVTSPSGTRFNF 



J"^' 615 KSFVKPPSLANLDKV-NSHSLDLPSSSDTTHAS— KVPDLHATSSASGGPL- P 

hu2, 1226 GKPTKQVPIATAENMKNSWISNPHATMTQQGNLDSPSGSGVLSSGSSSPLYSKHVDLNO 

* ** * *•*■** 

Ju2' i 111 ^fT!f P ^ ILN1NSASFSQGLELMSGFSVI,KETRMY ^SGL H RSMESLQMPMS---LP 

hu2, 1286 spiasspssahsapsnsltwgtnassssavskdglgfqsvsslhts^sidislssggS 

** * • . . * . **.*.. . . 

J"4' 721 SAFPSSTPVPTPPAPPAAP-TEEETEELTWSGSPRAGQLDSNQ RD 

hu2 , 1346 SHNSSTGLIASSKDDSLTPFVRTNSVKTTLSESPLSSPAASPKFCRSTLPKKQDSDPHLD 

* ***** # 

hul ' 765 RNTLPKKGLRY 1 

** ** ** ** **★ * * 

hu2* ^VSPTAAT TPRITRSNSIPTHEAAFELVSGSQM— GSTLSLAERPKGMIRSGSF 

hu2, 1466 SQIiASPTTVTQMSLSNPTMLRTHSLSNADGQyDPYTDSRFRNSSMSLDEKSRTMSRSGSF 

*** * *** *•«***. »»».„ 

hul' iS« RDPTDDVHGSVLSIiASSASSTYSSAEERMQSEQIRKLRRELESSQEKVATLTSQLSANAN 

hu2, 1526 RDGFEEVHGSSLSLVSSTLSVYSTPEEKCQSE— IRKLRREIiDASQEKVSALTTQLTANAH 

.... ... .. . .. .. ^ ^ ^ 

h,^' ™ LVA ^ QSL ^ MTSI ^^ TMEKD,ra I-LDLRETIDFLKXKNSEAQAVIQGAI,NASET 

hu2, 1585 LVAAFEQSLGNMTIRI.QSLTMTAEQKDSELNEIJIKT1ELLIUCQNAAAQAAINGVINTPEL 

......... ... . ... .. .. .. .. ^ ^ ... . . * . 

hus' ^::~~-- E ^ IraQNSSDSISSLNS ITSHSSIGSSKDADAKKKKKKSWVYELRSSF 
&U2, 1645 NCKGNGTAQSADLRIRRQHSSDSVSSINSATSHSSVGSNIESDSKKKKRKNW LRSSF 



iu2' \™l ^^ GPKSASSYSDIEEIATPDSSA PSSPKLQHGSTETASPSIK S STLSSVGT D VT 

i»u2, 1702 KQAFGKKKSPKSASSHSDIEE — TTDSSLPSSPKLPHNGSTGSTPLLRNSHSNSL 

** ...... ..... . ... ...... . ... 

h«l, U07 EGPAHPAPHTRLFHANEEEEPEKKEVSELRSELWEKEMKLTDIRIjEALNSAHQLDQLRET 

1755 ISECMDSEAETVMQLRNEI<RDKEMKLTDIRIiEALSSAHQLDQLREA 

* * * *• *» .*.».*»**.♦.» *♦«..,**». 



hUl ' 1167 MHNMQLEVDLLKAEKDRLKVAPGPSSGSTPGQVPGS 

hu2, 1857 tsldmi^dtgecsarkeggrhvkiwsfqeemkwkedsrphlf: 

* * * ** * * ... .... . . 

hu2' lilt ^^^^^^^^^^^^^^^^'^^^^'^^^^^^^^^^^O^PPEKPPCBRGVUN — -I 

1917 dgwrrlfkeyiihv dpvsqlglnsdsvi,gysigeikrsntsetpellpcgylvgentti 
* » .... ... ... * .... ..»..* 



hu2 ,p«, " nn "w 1 ^ v UJjIjKAENDRIiKVAPGPSSGSTPGQVPGSSAIiS— SPRRSLGLALTHSFGPSLA 

hu2, 1801 «NP^ Q SE I EKI^ N DR L K--- S ESQGSGCSRAPSOVSI S A S PRQS«GLS-QHSLNMEt 

* + **★ * ** * it 

T 2 : ^ !L D ™ G ff !^!Y^-^^ MPI, QH"KGDLKQQEFFI.GC S KVSGKVD M K H L 

L XGC I GVS GKTK WDVL 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 197/270 PCTYEP97/06956 



hul, 1341 SVSIiKGLKEKCTOSLVFETLIPKPMMQHYISLLLKHRRLVLSGPSGTGKTYLTNRIjAEYL 

hu2 , 1977 SVTVKGIAENSLDSLVTESLIPKPILQRYVSLLIEHRRIILSGPSGTGKTYLANRLSEYI 

** ★ ** * ****** ***** * * *** *** ************ *** ** 

hul , 1401 VERSGREVTEGIVSTFNMHQQSCKDIiQLYLSNLANQIDRETGIGDVPLVILLDDLSEAGS 

hu2 , 2037 VLREGRELTDGVIATFNVDHKSSKELRQYLSNLADQCNSENNAVDMPLVIILDNLHHVSS 

* * *** * * *** * * * ****** * * * **** ** * * 

hul , 1461 ISELVNGALTCKYHKCPYIIGTTNQPVKMTPNHGIiHLSFRMLTPSNNVEPANGFLVRYLR 

hu2 , 2097 LGE I FNGLLNCK YHKCP Y I IGTMNQATS STPNLQLHHNFRWVLCANHTEPVKGFLGRFLR 

* ** * ************ ** *** ** ** * * 

hul , 1521 RKLVESDSDINANKEELLRVLDWVPKLWYHLHTFUEKHSTSDFLIGPCFFLSCPIGIEDF 

hu2, 2157 RKLME TE I S GRVRNMELVK 1 1 DWI PK VWH HLNRFLE AH S S S DVT I GPRLFLSCP I D VDGS 

*** * ** ** ** * ** *** ** ** *** ****** 

hul, 1581 RTWFIDLWNNSIIPYLQEGAKDGIKVHGQKAAWEDPVEWVRDTLPWPSAQQDQS — KLYH 

hu2 , 2217 RVWFTDLWNYSIIPYLLEAVREGLQLYGRRAPWEDPAKWVMDTYPWAASPQQHEWPPLLQ 

* ** **** ****** * * * * **** ** ** ** * * 

hul, 1639 LPPPTVGPHSIASPPEDRTVKDSTPSSLDSDPLMAMLLKLQEAANY 

hu2, 2277 LRPEDVGFDGYSMPREGSTSKQMPPSDAEGDPLMNMLMRLQEAANY 

* * ** * * * * ** **** ** ******* 



WARNING: 49 local alignments have not been reported because of score < 100.0 



"SIM output with parameters: 
substitution scores in BLOSUM62 
O = 12, E = 4 M 

1583 residues 
2350 residues 

List of local alignments with score >= 54.0 



32.8% identity in 504 residues overlap; Score: 490.0; Gap frequency: 6.9% 



Ce 1 , 1058 VIE LK QE LKE R D S AL YE VRLDNLDRARE VD VLRE T VNKLKTE NK QLKKE VDKL TN GPATR 

hu2 f 1766 VMQLRNE LRDKE MKLTD I R LEAL S S AH QLDQLRE AMNRMQSE IEKIiK AE NDRLK SE S QGS 

* * ** ****** *** * * ***** 

Cel, 1118 ASSRASIPVIYD DEHVYDAACSST SASQSSKRSSGCNS IKVTVNV 

hu2 , 1826 GCSRAPSQVSISASPRQSMGLSQHSLNLTESTSLDMLLDDTGECSARKEGGRHVKIWSF 

*** + ** ***** 

Cel , 1163 DIAGEISSIVNPDKEIIVGYLAMSTSQSCWKDIDVSILGLFEVYLSRIDVEHQLGIDARD 



Sequence 1: Cel, 
Sequence 2 : hu2 , 



BNSDOCID: <WO 9824810A2J_> 



WO 98/24810 198/270 PCTYEP97/06956 



hu2. 



Cel, 
hu2 f 



Cel, 
hu2, 



Cel, 
hu2. 



Cel, 
hu2, 



Cel, 
hu2, 



1886 QEEMKWKEDSRPHL-FLIGCIGVS-GKTKWDVLDGWRRLFKEYIIHVDPVSQLGLNS D 

** * 

1943 SVLGYSIGEIKRSNTSETPELLPCGy-LVGENTTISVTVKGLMNSLDSLVFESLIPKP^ 

* *** *** * ***** ^ ^ 

1283 IWVKSIMBIWLVIJ^^ 
2002 WMVSIMEH^ 

* ************ # 

1341 LLQVERRLEKILRSKESCI VILDNIPKNRIATWSVFANV-PLQNNEGPFWCTV 

2062 LROYLSNLADQCNSENNAVDMPLVIILDNL-HHVSSLGEIFNGLLNCKYHKCPYriGTM 

* * * * **** * 

* * * 

1395 NRY— QIPELQIHHNFKMSVMSNRLE GFILRYLRRRAVEDEYRLTVQMPSELFKIID 

2120 NQATSSTPNLQLHHNFRWVLCANHTEPVKGrLGRFLRRKLMETEISGRVRN-MELVKIID 

1450 FFPIALQAVNNFIEKTNSVDVTVGPRACXNCPLTVDGSREWFIRLWNENFIPYLERVARD 

2179 WIPKVWHHLNRFLEAHSSSDVTIGPRLFLSCPIDVDGSRVWFTDLWNYSIIPYLLEAVRE 
» * • * »**..*** ♦* ***•» , 



Cel ' 1510 GKKTFGRCTSFEDPTDIVSEKWPW 

hu2 ' 2239 GLQLYGRRAPWEDPAKWVMDTYPW 



35.5% identity in 112 residues overlap; Score: 165.0; Gap frequency: 1.8% 

Cel ' 11 IYTDWANRHLSKGSLSKSIRDISNDFRDYRLVSQLINVIVPINEFSPAFTKRLAKITSNL 

1 1 IYTDWANHYIiTKSGHKRLIKDLQQDVTDGVUiAQIIQWA— — NEKIEDINGCPKNRSQMI 

******* * * * # ****** ** 

7 1 DGLETCLDYLKNLGLDCSKLTKTDIDSGNLGAVLQLLFLLSTYKQKLRQLKK 
6 9 EN I DACLNFLAAKGINIQGLSAEE IRNGNLKAILGLFFSLSR YKQQQQQPQK 
** * * * * *** ****** *** * * 

24.8% identity in 163 residues overlap; Score: 80.0; Gap frequency: 3.7% 



hu2, 



Cel, 
hu2, 



Cel, 
hu2, 



Cel, 
hu2, 




i III ™?^° rs ^ 

1594 GNMTIRLQSLTMTAEQK DSELNEIiRKTIELLKKQNAAAQAAINGVINTPELNCKGNG 

* * ***** ... . 



995 RSSMS SSSKSSKQEKISLSSFGK-NKKSWIRSSLSKFTKKKNK 
nu2, 1651 TAQSADLRIRRQHSSDSVSSINSATSHSSVGSNIESDSKKKKR 

* ** * * *** 

58.6% identity in 31 residues overlap; Score: 74.0; Gap frequency: 6.5% 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 199/270 PCTYEP97/06956 



Cel, 653 GYPDiJfEDSSSLSSGISDNNELDDISTDDLS 

hu2, 840 GDADSWDDSSSVSSGISDT — IDNLSTDDIN 

* * **** ****** * **** 



42.9% identity in 60 residues overlap; Score: 64.0; Gap frequency: 6.7% 



Cel, 984 RQPSLESVASHRSSMSSSSKSSKQEKISLSSFGKNKKSWIRSSLSK-FTKKKNKNYDEAH 

hu2, 1661 RQHSSDSVSSINSATSHSSVGS NIESDSKKKKRKNWLRSSFKQAFGKKKSPKSASSH 

*********** * **** *** * *** * 



22.0% identity in 91 residues overlap; Score: 56.0; Gap frequency: 0.0% 



Cd , 140 3KZjPSFRVAT3ATA3ATNFw5NFFQM5TSiUjQTPQS«Z5 

hu2, 177 SRIiSGPTARVSAAGSEAKTRGGSTTANNRRSQSFNNYDKSKPVTSPPPPPSSHEKEPLAS 

****** ** * ** 

Cel, 200 STTSSNNTNSFRPSSRSSGNNNVGSTISTSA 

hu2, 237 SASSHPGMSDNAPASLESGSSSTPTNCSTSS 

* * * * ** *** 



WARNING: 44 local alignments have not been reported because of score < 54.0 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 200/270 PC17EP97/06956 




Figure 12b 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 



201/270 



• 

PCT/EP97/06956 



Tuesday. 18 November 1997 10:09 

1ig 1 3 pCB201 (1 > 5082) Site and Sequence (V y f ge 

Enzymes: 100 of 146 enzymes (Filtered) j i^ff^ 

Settings: Linear, Certain Sites Only. Standard Genetic Code ^ ' 

GACGGATCGGGAGATCTCCCGATCCCCTATGGTCGACTC TCAGTACAATC TGCTCTGATGCCGCATAGTTAAGCCAGTATC TGCTCCCTGCTTGTG TGT" 

CTGCCTAGCCCTCTAGAGGGCTAGGGGATACCAGCTGAGAGTCATGTTAGACGAGACTACGGCGTATCAArTCGGTCATAGACGAGGGACGAACACAC^" '~ ' 
T 0 R , E 1 5 , R 5 P M V D S Q Y N L L . C R I V K P V S A P C L C V 

GGAGGTCGCTGAGFAGTGCGCGAGCAAAATTTAAGCTACAACAAGGCAAGGCTTGACCGACAATTGCATGAAGAATCTGC TTAGGGTT AGGCGfTTTGCI- 
CCTCCAGCGACTCATCACGCGCTCGTTTTAAATTCGATGTTGTTCCGTTCCGAACTGGCTGTTAACGTACTTCTTAGACGAArcCCAATCCGCAAAACG^ 
G G R v v R E Q N L 5 Y M K A R L D R Q L H E E S A . G . A F C 

CTGCTTCGCGATGTACGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTA CGGGGTCATTAGT^^ 
GACGAAGCGCTACATGCCCGGTCTATATGCGCAACTGTAACTAATAACTGATCAATAATTATCATTAGTTAATGCCCCAGTAATCAAGTATCGGGTAr^- 
A A , S R C T G , Q f Y A L T L I [ Q . L L I V 1 N Y G V I 5 S . P I V 

TGGAGTTCCGCGTTACaTaaCIIaCGG tAAATGGCCCGCCTGGC TGACCGCCCAACGACCCCCGCCCATTGACGTCAATAATGACGT ATGTTCC CA TAG7 

ACCTCAAGGCGCAATGTATTGAATGCCATTTACC6GGCGGACCGACTGGCG6GTTGCTGGGG6CGGGTAACTGCAGTTATTACTGCATACAAGGGTATCA 
G V P , R Y 1 . T Y C K V P A V L T A Q R P P P | 0VNN0VCSH3 

^CGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGACTATTTACGGTAAACTGCCCACTTGGCAGTA^ 

TT GCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTGATAAATGCCATTTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGG3 ^ 
N A N , R D F P L T S M G G L F T V N C P L GST SSVSYAKYA 

CCT AT TGACGTCAATGACGGTAAATGGCCCGCCTGGC AT TATGCCC AG TAC AT GACCTTATGGGACTTTCC TACTTGGC AGTACATCTACGTATTAGTCA 
GGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCATGTACTGGAATACCCTGAAAGGATGAACCGTCATGTAGATGCATAATCAG* ^ 
P Y • R Q • R . ' MARL A L C P V H D L M G L S Y L A V H L R I S H 

TCGCTATTACCATGGTGATGCGG TTTTGGCAG TAC ATCAATGGGCGTGGATAGCGGTTTGAC TCACGGGGATTTCCAAGT CTCCACCCCATTGACG TCAA 

agcgataatggtaccac tacgccaaaaccgtcatgtagttacccgcacctatcgccaaac tgagtgcccctaaaggttcagaggtggggtaactgcagt" 7:0 

P Y Y , H G D / V . L A V H Q V A V I A V , L T G I S K S P P H . R Q 
T «GAGTTTGT7TTCGCACCAAAATCAACGCGACT^^ 

a^ctcaaacaaaaccgtggttttagttgccctsaaagg " :y ' 

V E f V 1 A P K 3 T , G t S K M S . Q L R P IDANGR.ACrVG 

gtctatataagcagagctctctgcctaac^ 

cagatatattcg ^tcgagagaccgattgatcti ttgggtgacgaatgaccgaatagctttaattatgc TGAG TGATATCCCTCTGGGTTCGmCCGAT'" " 

l : > , 

I— T7 promoior pilming site —J 

G '- V K3SSLA V-*THCL LAYRN.YDSL.GDPSWL* 



GTTTAAAcrT^A GCTTACCATGGGGGGTTCTCA^ATCATCATCATCATGGTATGGCTAGCATGACTGGTGGACAGCAAATGGGTCGGGATCTGTAC.^- 

CAAATrTGAATTCGAATGGTACCCCCCAAGAGTAGTAGTAGTAGTAGTACCATACCGATCGTACTGACCACCTGTCGTTTACCCAGCCCTAGACATG.-TG 

I > | 

I— ProBond binding domain I | 

F '< L [ K L T H G G S H h H H H HGMASM TGGQQMGRCL YD 
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' '^ J "^ A ^ GA ^ A ^^ A ^^ A ^^^ A ^^9^^^^^^^^^^GTCAA ^* a ^ ATaT CAGTCTCCCTCAAAGGTCrGAA GGAGAAArGCCi~C5ii'"A 
'" TiK '~^^ ^^^ ^^^^^^ ^^^^^^^^^^^^^GGCAGCTCCACAjTTATTGTArAGTCAGAGGG AGTTTCCAGACTTCC TC TTT Afar ZKrrrJ- 




° ° ° V " R ' H " P P C . R « G V N N , S VSLKGLKEKCVD 



GCC7GGTGTTCGAGACGCTGATCCCCAA G CCG ATGArGCAGCACTACATAA3CCTCCrGCTGAAGCACCGGCGCCTCGT CCrCTCGGG.-CCCAGCGG.-,- 
CGGACCACAAGC TCTGCCACTAGGGGTTCGGCTAC TACGTCGTGATGTATTCGGAGGACGAC TTCGTGGCCGCGGAGCAGGA GAGrf rni^fjfrTr^rriTT^ 




3 L V F E T L - ' P K P M M Q H y ' S L L L K H R R L y L s G p s G T 
CGGCAAGACCTACCTGACCAATCGCTTGGCCG AGTACCTGGTGGAGCGCTCTGGCCGTGAGGTCACAGAGGGCATCGTCAG CACCTTCAACATG-^CA- 

cccgttctggatggactggttagcgaaccggctcatggaccacctcgcgagaccggcacIccagtgtctcccgtagcagIcgtggaagT^^^ 




G K T Y 1 T N R L * E ' ■-. * E » s G » evteg ivstfmnh 



CAGTCTTGCAAGGATCTGCAACTGTATCTTTCCAA CCTAGCCAACCAGATA5ACCGGGAAACAGGAATTGGGGATGTGC CCCTGGTGAT7CTATTr.,,T-: 
CA3AACGTTCCTAGACGTTGACATAGAAAGGT TGGATCGGTTGGTCTATCTGGCCCTTTGTCCTT AACCCCTACACGGGGACC Ar TAaftATAirrT.*" 



- PCB201 insert = U4 



-U4 CRF 



3 C K D l Q L YLSNLANQ I D R E T G I 



P L V iLL D 

A^-3AGTGAAGCAGGCTCCATCAGTGAGTT C G TCAAT G GGGCCCTCACCT3CAAGTATCATAAATGTCCCTAT ATTATAGGTACCAn-, aT rA-:r---.--.- 
^ACrCACTTCGTCCGAG G TAGTCACrCAACCAGT rACCCCGGGAGTGGA=GTTCATAGrATTTACAGGGATA TAATArcCATG iS T,^rA7^^ 

-pCB201 insert = U4 



-U4 ORF 



D ' S E A ° S - ' S £ L » » G * ■• T C K , H K C P r , t G T T , « p 



AAAAArGACACCCAACCATGGCTTGCACTTG AGCTTCAGGATGTTGACCTrCTCCAACAACGrGGAGCCAGCCAATGO CTTCCTGGTTrGrTA-CT^G: 

^ A ^^TGGGrTGGTACCGAACGTGAACTC3AAGTCCTACAACTGGAAGA G GTTGTTGCACCTCGG '"'^ 




" ^TPNH GLHLSFR HL T F S N H V E P A H G F L V 5 V L F 
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BASE COUNT 173 a 168 c 141 g 128 t 
ORIGIN 

1 gggccctcta gggtgcctgc tgcaggaagc acagcatagg tccagggagc ctctaattta 

61 aataggagaa gtcagagctt taacagcatt gacaaaaaca agcctccaaa ttatgcaaat 

121 ggaaacgaaa aagattcctc caaaggacct caatcgtctt caggtgtaaa tggtaacgtg 

181 cagcctccca gtactgctgg gcagcctcct gcctctgcca tcccttctcc aagtgccagc 

241 aagccctggc gcacgaagtc catgaatgtc aaacacagtg ccacctccac catgttgact 

301 gtaaagcagt caagtacagc cacctccccc acaccatctt cagacagact gaaggcaacc 

361 tgtctcagaa ggggtcaaaa ctgctccctc aggacagaaa tccatgcttg agaaattcaa 

421 gctagtcaat gcccggactg ctttacgccc cccgcagcct cccagttcag gacctagtga 

481 tggtgggaag gatgatgatg ccttttctga atctggtgaa atggaaggtt ttaacagtgg 

541 tctgaatagt ggtggctcaa caaatagcag -tcccaaagtg tcacctaagt tggcccctcc 
601 aaaagctgga 
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LOCUS 
DEFINITION 
ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



AA495042 418 bp ntRNA EST 27-JUN-1997 

fa05f 06.al Zebrafiah ICRFzfla Danio rerio cDNA clone 5D16 3' 

AA495042 

g2225470 

EST. 

zebrafiah. 
Danio rerio 

Eukaryotae; mitochondrial eukaryotea; Metazoa; Chordata; 
Vertebrata; Actinopterygii; Neopterygii; Teleoatei; Euteleoatei; 
Oatariophysi; Cyprinif ormea; Cyprinoidea; Cyprinidae; Rasborinae; 
Danio . 

1 (bases 1 to 418) 

Clark,M., Lehrach,H., Johnson, S., Marra,M., Eddy,S., Hillier,L., 
Aliens., Bowlea,L., Dubuque, T., Geisel,G., Jost,S., Kucaba,T., 
Lacy,M., Le , N . , Lennon,G., Martin, J., Moore, B., Schellenberg,K . , 
Steptoe,M., Tan,F., Theiaing,B., White, Y., Wylie,T., Wateraton,R. 
and Wilson,R. 

WaahU Zebrafiah EST Project 
Unpublished (1997) 

Contact: Steve Johnson 

Washington Univeraity School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Teli 314 286 1800 

Fax: 314 286 1810 

Email: eat Gwatson. wuatl.edu 

Steve Johnson lab internal ID - P2_60 NOTE - For this library, the 
CLONE id field represents a position identifier on the original 
cDNA library preparation plate. cDNA Library Preparation: Matthew 
Clark. cDNA Library Arrayed by: Matthew Clark. DNA Sequencing by: 
Washington University Genome Sequencing Center Clone distribution: 
Genome Systems, St. Louis, and Max Planck Institut fuer Molekulare 
Genetik, Berlin Tel +49 30 84 13 1235 
Seq primer: -40ml3 ET from Amersham 
High quality sequence stop: 416. 

Location/ Qualifiers 

1. .418 

/organism- "Danio rerio" 

/note='» Vector: pSPORTl; Site__l : NotI; Site_2 : Sail; 1st 
strand cDNA was primed with a Not I - oligo(dT)15 primer 
{ 5 • pGACTAGTTCTAGATCGCGAGCGGCCGCCCTTTTTTTTTTTTTTT3 ' J , on 
mRNA from pooled 26 somite zebrafish embryos; 
double-stranded cDNA was ligated to Sal I adaptors ( BRL) , 
digested with Not I and cloned into the Not I and Sal I 
sites of the pSPORTl vector (BRL). Library was constructed 
by Matthew Clark (Lehrach lab; ICRF, London and Max 
Planck Institut fuer Molekulare Genetik, Berlin) and was 
not biochemically normalised. 70,000 clones from this 
library were arrayed on high density filters and 
subsequently screened by oligonucleotide hybridization 
fingerprinting to identify unique or minimally redundant 
clones for more intensive analysis . " 
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AGGAAGC TGGTAGAGTCAGACAGCGACATCAATGCCAAC AAGGAAGAGCTGCTTCGGGTGCTCGACTGGGTACCCAAGCTGTGGTATCATC TCCACACC " 
TCC TTCG-CCATCTC AGTC TGTCGCTGTAGTTACGGTTGTTCCTTCTCGACGAAGCCCACGAGCTGACCCATGGGTTCGACACCATAGTAG AGG7G TGGA 



•pCB201 insert * U4 

U4 0RF : 

RKLVESDSD I NANKEELLRV/LDVVPKLVYHLHT 

TCCTTGAGAAGCACAGCACCTCAGACTTCCTCATCGGCCCTTGCTTCTTTCTGTCGTGTCCCATTGGCATTGAGGACTTCCGGACCTGGTTCAT7GACC7 
AGGAACTCTTCGTGTCGTGGAGTCTGAAGGAGTAGCCGGGAACGAAGAAAGACAGCACAGGGTAACCGTAACTCCTGAAGGCCTGGACCAAGTAACTGGA 

- pCB201 insert = U4 - 

U4 0RF Z=ZZ 

F L £ K H 5 T 5 DFLIGPCFFLSCPIGIEOFRTVF IDL 

GTGGAACAACTCTATCATTCCCTATCTACAGGAAGGAGCCAAGGATGGGATAAAGGTCCATGGACAGAAAGCTGCTTGGGAGGACCCAGTGGAATG3GTC 
CACCTTGTTGAGATAGTAAGGGATAGATGTCCTTCCTCGGTTCCTACCCTATTTCCAGGTACCTGTCTTTCGACG AACCC TCC TGGGTC AC C TT AC CC A3 

- pCB201 insert - U4 

-U4 OflF 

v N NSIIPYLQEGAKDGIKVHGQKAAWEOPVEV V 

C'^GGACACACTTCCCTGGCCATCAGCCCAACAAGACCAATCAAAGCTGTACCACC TGCCCCCACCCACCGTGGGCCCTCACAGCATTGCCTCACCT-C'::; 
GCCCTGTGTGAAGGGACCGGTAGTCGGGTTGTTCTGGTTAGTrTCGAC ATGGT GGACGGGGGTGGGTGGCACCCGGGAGTGTCGTAACGGAG TGGA3GGC; 

•pCB201 insert = U4 

— U4 0RF 

sdtipvpsaqqdqsklyhlppptvgphsiaspf 

' 1 1 ■ '■" 1 ■ ■ ' — 1 ■ • ' - - ' - ; ■ ■ , . . , , , 

agsatag3acagtc aaa3acagcaccccaagttctctggact cagatcctc tgatggccatgctgc tgaaacttcaagaagctgccaactacaf'g ag tl 
tcctatc:"gtcagtttctgtcgtggggttcaagagacctgagtctaggagac taccggtacgacgactttgaagttcttcgacggttgatgtaac: 

— pCB201 insert = U4 

U4 0RF 

e 0 r t v k d s t psslosdplmamllklqeaan y 1 e .;. 

tccagatcgagaaaccatcctggaccccaaccttcaggcaacact ttaagggttcggcaatcactgtcacccccggacagcagaacgctggcatcagcta 
aggtc tagc tcrttggtaggacctggggttggaagtccgttgrgaaattcccaagccgttag fgac ag tgggggcctgtcgtc ttgcgaccgtagtcga" 

" : pC8201 insert = U4 



-U4CP.F 1 

DPET1LDPNLQATL G F G N H C H P R T A E P ' *■> H <} L 
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TCTTAGCTCCTCCTCTCCCCTCTCCTCTTTCAGAGCACTGGCTCTCCAGCCCCAG GAGGAGAACAGGAGGGAGGAGGAGATGAAAGAGGAGGGACAGGT- 
AGAATCG AGGAGGAGAGGGGAGAGGAGAAAGTCTCGTGACCGAGAGGTCGGGGTCCTCCTCTTGTCCTCCC TCCTCCTCTACTTTCTCCTCCCTGTCCAA 



-pCB201 insert = U4 



S • L L L S P L L F Q S T G S P A P G G E Q EGGGDERGGTG 

CTTGGTGCTGTACCTTTGAGAACTTCCTAGGAAGGAATGGTGGGGTGGCGTTTGGGAAC TTGTGCCCCCTAAACACATTTACTGGCCTCCTCTAATGACT 
GAACCACGAC ATGGAAACTCTTGAAGGATCCTTCC TTACCACCCCACCGCAAACCCTTGAACACGGGGGATTTGTGTAAATGACCGGAGGAGATTACTGA 



- pCB201 insert = U4 



SVCCTFENFLGRNGGVAFGNLCPLNTFTGLL L 
' ■ . ■ i ■ i , » i . . 

TTGGGGAAAAGATGATTCTGGGTCTTTCCCTTGACTTCTTGTTTCAATTAC AAACTCCT GGGCTTTCTGGGGAGGGGTTC AGAAAACATCAAAACACTGC 
AACCCCTTTTCTACTAAGACCCAGAAAGGGAACTGAAGAACAAAGTTAATGTTTGAGGACCCGAAAGACCCCTCCCCAAGTCTTTTGTAGTTTTGTGACC3 



-pCB201 insert = U4 



W G K D D S G S f P . L L V S t T N SVAFVGGVQKTSKHC 

AGCAGTTCCTAAATGATTC TCAC AAGCAACCCTGAGAGAGACAGTCTTGTGAGGGAGATC T GGGGGAGGCAGGAAGCTCCTCAGATTTTCTCACAGACCC 
TCGTCAAGGATTTACTAAGAGTGTTCGTTGGGACTCTCTCTGTCAGAACACTCCCTCTAGACCCCCTCCGTCCTTCGAGGAGTCTAAAAGAGTGTCTGGG 



-pCB201 insert = U4 



SSS . M ILT S N PER OSL VREIVGRQEAPQ IF5QT 

T TCCCAATTCCATC ACC AC TGCCAACAC TCGTCCGGAAT TCTGCAGATATCCAGCAC AGTGGCGGCCGC T CGAGTCTAGAGGGCCCGTTTAAACCCGC 
AAGG3TTAAGGTAGTGGTGACGGTTGTGAGCAGGCCTTAAGACGTCTATAGGTCGTGTCACCGCCGGCGAGCTCAGATCTCCCGGGCAAATTTGGGCGAC 

pCB201 insert = U4 1 

*-^ N3 I TT ANT RPzFC R y pa q wrplesrgpv. tr 
ATCAGCCTCGACTGTGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCT GGAAGGTGCCACTCCCACTGTCCTTTCC 

tagtcggagctgacacggaagatcaacggtcggtagacaacaaacggggagggggcacggaaggaactgggaccttccacggtgagggtgacaggaaagg 

3 A 5 T v p S S C Q P 5 VVCPSPVPSLTLEGATPTVL .i 

taataaaatgaggaaattgcatcgcattgtctgagtaggtgtcattctattctggggggtggggtggggcaggacagcaagggggaggattggg aagaca 

A TT AT TT TACTCCTTT AACGTAGCGTAACAGACTC ATCC AC AGTAAGATAAGACCCCCCACCCC ACCCCGFCC TGTCGTTCCCCCTCCTAACCCTTCTGT 
• NEE I A S H C L S R C H S 1 L G G G VGODSKGEDVED 

ATAGC AGGCATGCTGGGGATGCGGTGGGCTCTATGGCTTCTGAGGCGGAAAGAACC AGCTGGGGCTCTAGGGGGTATCCCCACGCGCCCTG TAGCGGCGC 
TATCGTCCGTACGACCCCTACGCCACCCGAGATACCGAAGACTCCGCCTTTCTTGGTCGACCCCGAGATCCCCCATAGGGGTGCGCGGGACATCGCCGCG X 
NS R HAG0AVGSMASEAERT3VG5RGYPHAPCSGA 



ATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAG CGCCCTAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGC-: 
TAA7TCGCGCCGCCCAC ACCACC AATGCGCGTCGC AC TGGCGATGTGAACGGTCGCGGGATCGCGGGCGAGGAAAGCGAAAGAAGGGAAGG AAAGAGCG^ 
l S A ^ A G V V V T R S V T A T L ASALAPAPFAFFP SFLA 

ACGTTCGCCGGCrTTCCCCGTCAAGCTCTAAATCGGGGCATCCCTTTAGGGTTCCGATTTAGTGCrTTACG GCACC TCGACCCC AAAAAAC TTGAT TAG^ 
TGCAAGCGGCCGAAAGGGGCAGTTCGAGATTTAGCCCCGTAGGGAAATCCC AAGGCTAAATCACGAAATGCCG TGGAGC TGGGGT TT TT TG AAC TAATCC 
T F A G -PR Q * LNR G IPLGFRFSALRHLOPKKLO. 
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WO 98/24810 



207/270 



PCT/EP97/06956 



Tuesday. 18 November 1997 10:09 p 
fig 1 3 pCB201 ( 1 > 5082) Site and Sequence 9 

GTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTC CACGTTCTTTAATAGTGGACTCTTGTTCCAAACTG^ 
CACTACCAAGTGCATCACCCGGTAGCGGGACTATCTGCCAAAAAGCGGGAAACTGCAACCTCAGGTGCAAGAAATTATCACCTGAGAACAAGGTTTGAC: 
G 0 G $ R S G P S P . , T V F R P L T L E S T F F N S G L L F Q T G 

AACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGATTTTGGGGATTTCGGCCTATTGGTTAAAAA ATGAGCTGATTTAACAAAAArTT 

ttg ttgtgagttgggatagagccagataagaaaac taaatattccctaaaacccctaaagccggataaccaattttttactcgac taaattgtttttaaa 

T T L , N P 1 , S V Y S r D L - G 1 L G I S A Y W L K N E L I Q K F 

aacgcgaattaattctgtggaatgtgtgtcagttagggtgtggaaagtccccaggctccccaggcaggcagaagta tgcaaagcatgcatctcaatta^^ 

TTGCGCTTAATTAAGACACCTTACACACAGTCAATCCCACACCTTTCAGGGGTCCGAGGGGTCCGTCCGTCTTCATACGTTTCGfACGTAG ^ 

* A N , FCGMCVS . GVESPQAPQAGRSMQSMHLN 

' 1 ' 1 11 i > ■ . i i i i .„. i . . ■ . , , , t ■ . 

CAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCAT CTCAATTAG tcagcaaccataqtcccgcccctaactcc 
G7CGTTGGTCCACACCTTTCAGGGGTCCGAGGGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAATCAGTCGTTGGTATCAGGGCGGGGATTGAG3 
3 A T R C G K S P G S P A S R S M Q S M HLN.SAT1VPPLTP 

GCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTC |CCGCCCCATGGCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCC GCCTCTGCCTCT 
CGGGTAGGGCGGGGATTGAGGCGGGTCAAGGCGGGTAAGAGGCGGGGTACCGACTGATTAAAAAAAATAAATACGTCTCCGGCTCCGGCG^ ^ 
P 1 P . P L T P P S S A H S P P H G . L I F F I Y A E A E A A S A 3 

GA5CTATTCC AGAAGTAGTGAGGAGGCrTTTTTGGAGGCCTAGGCTTTTGCAAAAAGCTCCCGGGAGCT TG TATATCCATTTTCGGAT C TGATCAAGAGA 
CTCGATAAGGTCTTCATCACTCCTCCGAAAAAACCTCCGGATCCGAAAACGTTTTTCGAGGGCCCTCGAACATATAGGTAAAAGCCTAGACTAGTTCTCT ^ 
E L F Q K • ; G G F F G G L G F C K K L P G A C IS I F G S 0 Q E 

CAGGATGAGGATCGTTTCGCATGATTGAACAAGATGG ATTGCACGCAGGT TCTCCGGCCGCTTGGGTGGAG AGGCTATTC GGC TA TGAC TGGGCAC AAC A 

G-CTACTCCTAGCAAAGCGTACTAACTTGTTCTACCTAACGTGCGTCCAAGAGGCCGGCGAACCCACCTCTCCGATAAGCCGATACTGACCCGTGTTG- ^ ' 
T 5 ■ G S F R M I E Q D G L H A G S P A A V V E R L F G Y 0 W A Q 0 

GACAATCGGCTGCTCTGATGCCGCCGTGTTCCGGCTGTCAGCGCAGGGGCGCCCGGTTCTTTTTGTCAAGACCGACCTGTCCG GTGCCCTGAA 

:"ttag:cgacgagactacggcggcacaaggccgacagtcgcgtccccgcgggccaagaaaaacagttctggctggacaggccacgggacttacttg.u- "' :> 

S D A A V F R L S A Q G R P V t FVKTDLSGALNEL 



i G C 



CAGGACGAGGCAGCGCGGCrATCGTGGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTGAA GCGGGAAGGGACTGGCrGCTA- 

gtcctgctccgtcgcgccgatagc^ 

'"' ° E A A R L S V L , A T T G V P C A A V L 0 V y TEA G R D 'J L L 
-G-GCGAAGTGCC3GGGCAGGATC^ 

acccgcttcacggccccgtcctagaggacagtagagtggaacgaggacggctctttcataggtagtaccgactacgttacgccgccgacgtatgcgaac- 

L G E V P G Q D L L S 5 H L A P A £ K V S I N A D A M R R L H T L 0 

"ccggct acctgcccattcgaccaccaagcgaaac atcgcatcgagcgagc acgtactcggatggaagccggtcttgtcgatcag gatgatc tggacgaa 
aggccgatggacgggtaagctggtggttcgctttgtagcgtagctcgctcgtgcatgagcctaccttcggccagaacagctagt^ L ' 3C< 

P A T C P F 0 H Q A K H R I E r A r t R M E A G L V D Q D D L 0 E 

ga-jcatc^ggggctcgcgccagccg^^ 

C TIGTAG TCCCCGAGCGCGGTCGGCTTGACAAGCGGTCCGAGTTCCGCGCG TACGGGC TGCCGCTCC TAGAGCAGC AC TGGG TACCGC TACGGACGAAC J ~~ 
£ H G G L A P A E L F A * > « ARMPDGEOLVVTHGDACL 
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(ig13pCB201 (1>5082) Site and Sequence Pa 9 e 

CGAATATCATGGTGGAAAATGGCCGC TTTTCTGGATTCA TCGAC TG TGGCCGGCTGGGTGTGGCGGACCGC TATCAGGACA TAGC6TTGGC TACCCGTGA 

^^ a tagtaccaccttttaccggcgaaaagacctaagtagc tgac accggccgacccacaccgcctggcgatagtcctgtatcgcaacccatgggcac^ £5C ' : 

P H 1 " V E N , G R F ,S G F I D C G R L G V A D R Y Q D I A L A T D 

r attgctgaagagcttggcggcgaatgggctgaccgcttcctcgtgctttacggtatcgccgctcccgattcgcagcgcatc gccttctatcgccttctt 

ATAACGACTTCTCGAACCGCCGCTTACCCGAC ^^^GAAGGAGCACGAAATGCCATAGCGGCGAGGGCTAAGCGTCGCGTAGCGGAAGATAGCGGAAGAA ^ 
I A E E L G G E W A 0 R F L V L Y G I A A P D S Q R I A F Y R L J 

GACGAGTTCTTCTGAGCGGGACTCTGGGGTTCGAAATGACCGACCAAGCGACGCCCAACCTGCCATCACGAGATTTC 

ctgctcaagaagactcgccctgagaccccaagctttactggctggttcgctgcgggttggacggtagtgctctaaagctaaggt^ i7C,; 



DEFF.AGLVGSK 



P T K » W PTCHHE IS IPPPPSM 



AAGGTTGGGCTT CGGAATCGTTTTCCGGGACGCCGGCTGGATGATCCTCCAGCGCGGGGATCTCATGCTGGAGTTCTTCG^ 

ttccaacccgaagccttagcaaaaggccctgcggccgacctactaggaggtcgcgccccIagagtacgacctcaagaagcgggtggggttga 



KGVASESFSGTP 



A G 



S S S A G ISCUSSSPTPTCtL 



GCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTT 



TTCACTGCA TTCTAGTTGTGGTTTGTCCAAACTCATCA 

TTTGAGTAGT 

° L '■ " V T * A ' A S 0 I 5 Q I KHFFHC I LVVVCPNS3 



cgtcgaaiattaccaatgtttatttcgttatcgtagtgtttaaagtgtttatttcgtaaaaaaagtgacgtaagatcaacaccaaacaggt — ' ' h 



ATGTATCTTATCATG TCTGTATACCGTCGACCTCTAGCTAGAGCTTGGCGTAATCATGGTCArAGCTGTTTCCTGTGTGAAATTGTTAT CCGCTCACAAT ' 

tacatagaatagtacagacatatggcagctggagatcgaIctcgaaccgcattagtaccagtatcgacaaaggacacacIttaacaataggcgagtgtt- *** 

H Y L .' " 5 V Y R R , P A R A W R W H G H S C F L C E I V I R S Q 

TCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAArT GCGTTG 
AGGTGTGTTGTATGCTCGGCCTTCGTATTTCACAT7TCGGACCCCACGGATTACTCACTCGATTGAGTGTAATTAACGCAAC ^ 
F H T T Y Z P E A ■ S V KPGVPNE.ANSH.LBV 
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▼360 V370 v380 v390 v400 v410 v420 V430 v440 

SS* a faiv ACAS?AKFA?BTRLFEA KfTCn n r cjuvj. v'S EXJcS KLWsX&ruuL. TP nU^HAI^NSAB QUjtfLKS TMHKMQLgVDLIJAENDRLKVAP 
SSVGT : VTE . PAH : . PBTRXF : ANEIEEPEKXEVSZIJ^ELWKKKHKLTDIRT^ 

SSYSTBVTETCAHSVPHTRX^QAKKXXXPBTaOSVSKLRSE LWKKEMKLTD IRLEAXJZSAHQIXQIJUSTMSniMQUr^ 

-X0 "20 *30 -40 -50 -60 -70 *80 -90 

v450 v460 v470 v480 v490 v500 vSlO v520 v530 

GPSSGSI7GQVPGSSAI^SPPJ^UZ^THSrGPSXJU>TDI£P 

GPSSG . TPOQVPGSSALSSPRRSLGLAL : H : F : PSL : DTDLSPMDGISTCG : KEKVTLRWVRKPPQH 1 1XCDLKQQKITLGCSKVSG1CV 
GPSSGCTPGQVPGSSAI^SPRRSUaAIJHPFSPSLTDTDLSPMTCIST^ 

•100 -110 -120 -130 -140 -150 -160 -170 *180 

v540 vS50 v560 v570 v580 v590 v600 V610 v620 

DWKKUJEAVFQVPTOYISWIDPASTI^IJTZSM^ 

DWKKLD8AVFQVFTOYISKKDPASTLGLSTES IHGYS : SBVKRVLDAKPPBKPPCRJIGVNHISV : LKGLKZKCVDSLV7STLIPFPMMQH 
DWJWLDEAVPUVTTOYISKMDPASIIXII^TESIHGYS^^ 

-190 -200 -210 -220 -230 -240 -250 -260 -270 

V630 v640 v650 v660 v670 v680 v690 v700 v710 

YISIJXXHRRIATI^GPSGTGTITYLTNRIAEYLVZRS^ 

YISIJJ^HPJU.VI^PSCTGKTyT.TWUA5YXV2RSCRifVT : GIVSTrNMHQQSCKDICLVIJ5SIJWQIDRKTGIGDVFLVILr»DDLSXA 
yiSTJ.TiKfTRPX.VLSGPSGTG^TYLTNRt^SYLVKRSGRgVT^ 

-280 -290 -300 -310 -320 "330 -340 -350 -360 

v720 V730 v740 v750 v760 v770 v780 v790 v800 

GS ISELVNGALTCKYHXCPYI IGTTNQPVF>TrPNHGLHI^FTlKLTySNNVKPANGrL^ 

GSISKLVNGALTCAiu^CTYTIGTTNQPVKKrPNHGLSLSFTlKL : HANKEJKLLRVLDWVPKLW 

GS LSKLVKGALTCKYHKCPY1 IGTOIQPVKMTPNHGimSFRHLTrSNNVEPANGrLVR^ 

*370 -380 -390 -400 -410 -420 "430 -440 -450 

v610 v820 V830 v840 v850 v860 v870 v880 vB90 

YHiaTri^KBSTSDFL ICPCPPXSCP IGIEDTOTWF IDLHIWS I IPYI^EGAKIX; 
YHIJJTriJEKBSTSDri.ICPCPri^CPXGIBDFOTWriDLWW 
YHI£TFI£KBSTSDriaCPCrrXSCPIGIZDrRTWriDLV^ 

-460 -470 MBO M90 -500 -510 -520 -530 "540 

v900 v910 v920 v930 v940 v950 v960 

LPPPTVGFHSIASPPEDRT\flWSTPSSLDSDPIJiAKIJLlCT/QEA 

LPPP : VCPHS . ASPPEDRTVKDSTP : SLDSDPLMAKU*KLOEAANYIESPDRETII.DPNIiQATL 
IJ , PPSVGPUSTASPPEORTW5STPHSLI»DPLMAKIXKI^EA\NYIESPDRETII.DPNt^A^ 
-550 -560 -570 -580 *590 -600 
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SnaB I Nco I 

'ATGGGACTTTCCT^CTTGoCAGTACATCTACGTATTAGT CArcaCTATTACCATGGTGATGCGGTTTTG 

1 - • - - i -*r\ 

GC AG TAC ATC AA TG5GCG To GATiXGCGGTTTGACTCACG GoG AT TTCC A AGTCTCCACCCC A T~G ACS TC 

— ■ — 1 , 1 : ± - y0 

AATGGGAGTTTGTTT-GGCACCAAAA-CAACGGGACTTT CCAAAATGTCGTAACAACTCCGCCCCATTGA 

—— ' 1 — ^ ■ — 2:0 

CG^AAA. SGGCGSTACGCSTGTACojTGSGAGGTC TATA TA AGC AGAGC TCTCTGGC TAAC T AGAG AACC 

' ' ' ' 1 — 220 

L E N 
Asp 718 
Hind III ; Kpn I Bam HI 

cactgctac t gg:t-atc5aaattaatacgact:act atagggagacccaagcttggtaccgagctcgg 

~ — ' — — 250 

P: -- t 3L3nL iRLT;3R^<L3""UG 

BstX I 

Spe I Xma Iff EcoR I Pst I BspM II 

Arc:AC"t3TAACG3CCG::AGTGTGC"33AATT:TG CAGAT-ATGCCATCAAT7TCCGGATCTCAAGGA 

— , ' U20 

> ■ S M 33.:CAGILQIM?5ISGS0S 

*ps:sgsos 

ApaLI 

ACTCT7GACAACATTGA7GTGA7T3AGTTGAAGCAAGAG CTCAAAGAACGCGA7AGTGCACT7TACGAAG 

' ' ' ' ' " ^90 

|L:)NI D V!£.K0EL<E^CSALYE 
T L D N ! _C V I E L. K J £ l .< E 3 D S A L Y E 
■CC3CCTGACAATC GGA TCGTGCCCGC3AAGTT GATGTTCGAGGGAGACAG7GAACAAG77GAAAAC 

— 1 1 1 — ■ 560 

v R - 0 N U D * A P E V 0 V _ R E T V N K L K 7 
CGAGAAC AAGCAA TT AAAGAAAG AAG TGGAC AAAC TCACCAACG3TCCAGCC AC TCG 7GC TTC T7CCCGC 

^ ' — ' ' — ■ — — • 1 — £2Q 

' - * r 0 : - ' 1 E 7 C V L T N 0 » A T 5 A S S 3 
* * 0 L • E ■/ c < L T N G ? A T 3 A S 3 9 

GC:7CAArrcCA3rTA-CT^CGAC3^:GA3 :A"37:-A-3^rGCA3CG-3TAGCAorACATCAGCTAGTC 

— — 7 . :D 



p v : • :• c £ h v •' o a a : c 



S A 3 
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Asu II 



wATCT'CGAAACGATCCTCTGGCTGCAACTCAATCAAGGTTACTGTAAACGTGGACATCGCTGGAGAAAT 
, . ■ ■ : ■ = 770 

CSSK5.SS3CNSIKVTVMVCIAGSI 

C3SK*SS3CNSI<VTVNVCIAG.E: 

Pvu I 

Hpa I EcoR V 

CAGTTCGATCGTTAACCCGGACAAAGAGATiATCGTAGGATATCTTGCCiToTCAACCAGTCAGTCATGC 
■ i - SUO 

S 3 I V N ? 0 K t I IVGYLAMSTSQ5C 
SS ; V N P 0 K £ I IV3YLAM3.T5GSC 
"GoAAAGACA--3ATGTTTCTATTCTAGGACTATTTGAAGTCTACCTArCCAGAATTGATGTGGAGCATC 
. « ■ 1 9»0 

rfKD:3VSILGL"EVYLSRi:v£H 
>jt K 0 *D VS ILGL^EVYLSR I u V £ H 

Cla l Mlu l 

aacttggaatcgatgctcgtgattctat:cttggctatcaaattggtgaact*cgacgcgtcat"3Gaga 
■ i i 1 1 ■ 930 

CLG:DARj S IL 3YCIGE.P5VI30 

OLGIDAROcILGYOIGE.PhVIGO 

CTCCACmACCA'GATAACCAGCC atccaactgacattcttacttcctcaactacaatccgaatg - tcatg 
, , . . . . — '050 

S*-v; 7 S H P ' 0 TSS'TIPMFM 

S7-v iT 3Hi> r 0rLTSS - 7I!iMFM 

CAC3G t GCC3CACAGAGTCGCGTAGACAGTCTGGTCCTTGATATGCT"CTTCCAAA3CAAATGATTCTCC 

HGAA3SRV0SLVLDML'-P<CM!L 
HGAAQSRVDSLtfLOML-P<CriIL 

AACTCGTCAAG'CAATTTTGACAGAGAGACGTCTGGTGTTAGCTGGAGCAAC 7GGAA~"GGAAAGAGCAA 

■ • ■ j . j i . . . 

GLVKS'LTERPLVLAGATGIGKSK 
CLVKSILTERRLVLAGATGiG<SK 

ASU II 

ACTGGCGAAGACCCTGGCT3CT"ATGTAT;TATTC3AACAAATCAATCC3AAGATAG*ATTGTTAATaTC 

. . . . . i , , , — — . ■ ■ — — — * 260 

A K T L ^ A • v i 1 P T N 0 S E j 3 7 N I 

L A K T L A A V* S I R T N G S E Z > ! V N I 

Bsm I Bgl II 



120 



190 



AGCA"7CCT3AAAACAA" AAAGAA3AA7T3C 7~CAAG7G3AACG ACGCC ? 33 AAA A3 A"C TTGAGAAGCA 
5 ; P E N N * £ E - L 0 V E 3 R . E L 3 5 

s : - p e n n ► eelloverr.e* : l ^ 5 



'330 
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Ava Ml 

Nsi I Xba I 



^ArCATGCATCSTAATTC-ASATAATATCCCAAAGAArCGAATTGCATTTGTTSTA-CCGTTTTTgC 



esc; v . r -On 



S C ' • - 0 N P < m i ^ r v v 



: 610 



3 < N 3 : a - V 7 z V F A 

V r A 

EcoR V 

aa.tg-cccactt-aaacaacgaaggtccat-tgtagtat^acagtcaaccgatatca^-ccctsag 

V * L C * E 2 P - v V ; r V N ft y C j p r 

* v p - c V fc £ G p - v v : - v N P Y r I p - 

C- 7 ^AArrCACC^AATTTCAAAATG7CAG7AATG'CGAArCGTCTCGAAGGA7TCA7C:TACGTTACC 

- C 1 H * F < M S V M S ,\ w L r r- r ■ L , v 

- 0 1 H ^ n f < m s v y s x , L - n _ : ^ ^ " 

CCoACGACGGGC3 o 7AGAGGA T 3 AG 7A TCG T CTA^C7GT^C AG ATGCCATCAGAGCC" TC AAA A TCA7 

L R * R " l/ ^ ? E Y R L r V J * * S E r < | ' 

L P * R - * E 3 E Y R L T V 3 H ? S £ ~ ? < J 

EcoR I 

-GACT-CT7CCC A^AGCTCr':AGGCC3TCAATA^rr7T--GAGAAAACGAA77C-G"y3A 'GTGACA 
' £ F ? • * L C A V N N r , E K T N s .. , v T 
■ ~ «- 5 A V M N » ! E K T V S V 0 V T 

Bam Hf 

J 5 ' » * c l n c p l r v c g s , E v r : p L ,75 ° 

V G ? P - C L N C ^ L T V C 3 5 3 E V r • Q , 
GGAATGAGAACTTCATTCCA7ATTTGGAACG7GTTGCTAGAGAT3GCA AA^AAC C T 7CG5TCGC TGCAC 

•• m - u - ■ ~ ' ' "~ ' ?320 

* N - N " ' ? v L £ R v a R 3 G -. < - r , - 

'"' N E N "" • p ■ L £ R V A p T .: ; , - r .1 - - - 

Ba.ti HI Tth I " 

•-::T:c3AGGAr:;cAccGACArcG:cTCTAAAAAA^GccG-33r:c3A7GG^AAA^;ccGAGAAT 

S F E ^ ? T C : v J ,r k v o - r r- , r . , E N ''^ 



s F E D p r c ; v s i. ? ■.• f c 



3 E N 
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Tthl 

G"3C"CAAACGTCTTCAACTCCAAGACCTCGTCCC3"CAC C'GCCAACTCATCCC3ACAACACTTCAATC 

* - - - ■ 1 1 ■' ■ ' — " 

vl <r_0LQCLV35PANSS^.QHFN 
■/L^R.OLOCLVBSPANSS'JOHPN 

A ... I 

Ava i 

Xhol 

CCITCGA3TCG~T3ATCCA*^"GCATGCTACCAAGCATCA5ACCATCGACAACATTTGAACAGAAGACTC 



■960 



2030 



F L 
P L 



QLHA7KH0TI0NI 
QLHAT<nQTION I 

Asp 718 
Kpn I 



-AATC""CTCTC3CCTCTC:CCC3CTT"CCTTATCT"CGTACCG3TACCTGATGArTCCCCATTTTCCCC 



Ava I 
Xma I 
; Sma I 

CTTTTCCCCCCAATT'CCCAGAACCTCCTGTTCCCT-TGTTCCTAGTC CTCCCGGGTGCCGACGCCGAAG 
■ ' — — 1 ■ ■ * ■ ■ i ■ ■ — i — - 

cgatt-m^aaacctttt t ct"c:gaaaca tt-c::a7tgc"Cattaatagtcaaat7gaataaacagtg 

Ora tl 
Dra II 
Pss I 

: Apa I 

Pss I 

"A T3 TAC T7A AAAAAAAAAAAAAAAAAAAAAAAAAAGGGCCC 7AT7C7ATAG7G7CACC7AAA7GCTAGA 



2!00 



2! 70 
2240 



Bel I 



GCTCGC7SATCAGCCTCGACTG7GCC"7CTAGTTG::aGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTT 



CCTTGAC 


ZZ FGG A-GG7GCT 


AC TCCC AC 


TGTCCTTTCCTAATAAAArGAGGAAATTGCATCoCATTGTCT 


GA3T A'3G 


ToTCATTC "A7TC 


~GG GGGG ' 


333GTo33-3CAGGACAGCAAGG3GGAGGATTGGGAAGACAAT 








Pvu It 


AGCAGGC 


ATGC"GGGGA r GC 


GG GG 3 C T 


: TATGG:""C t 3AGGCGGAAAoAACCA3C"GG3GC7C taggg 




ccacgcgccctgt 


AGC ggcgc 


ATTAAGC3:GGC3GG7GTGGTGGTTAC3CGCAGCGTGACCGC 



23t0 

2380 
2U60 
2520 

2590 
2660 
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Nae I 



GGC 

2730 



' A ^C""SCCAGC3CCCTAGCGCCCSC7CCT7TCGC7rTCTTCCCTTCCTTTC7CGCCACS TTCGCg 

"T:CCCSrCAA3CrCTAAA7CGSGSCA7CCCrrUGGG77CCGA7T7 AG7GCT77ACGGCACCTCGACC 

^~ ' ' ' 2800 

Dra lit 

CC AA ^AACT7GATTAGGGTGA7GST7CACG7AGTGGGCCA7CGCCC-GATAGACg S77TTTCGCCC777 

GAC3-7G3AG7CC^G77CT77AA7AG7GGAC7CTTG77CCAAACT GGAACAACACTCAACCC7ATCTCG ^ 

L " ' ' 111 * ' 2940 

G:CTAT7CrT773AT7TATAAGGGAT77TGGGGATrTCG3CC7ArTGG7TAAAAAATGAGCTGA7 TTAAC 

^ AA A77TAACG:3AAT7AA77C7S7GGAATG7G7G7CA5T-AG GG7G-3GAAAG7CCCCAGGC7CCCCA 

" ~ " — ' 3080 

Ava III 
Nsi I 

g6^GGCAGAAGTATGCAAAGCATS^-C7CAAT7AG7CA3CAAC CAGG7G7GGAAA67CCCCAGSC7CC 

— — 3!50 

Ava III 
Nsi I 

^ A 3CAGGCAGAAGTA7GCAAAGCATGCATC7CAAT7AG7CAGCA ACCATAS7CCC3CCCC7AAC7CCGC 

' ' ' — — 3220 

NCO I 

CCATC CCSCCCC7AAC7CC3CCCA577CC3CCCATTC-CC3CCCC A7GG:73AC7AA7T77TT77ATT7A 

" ' ' ' '■ ' ' 3290 

Stu I 
! Avr II 

-GCAGAGGCCGA33CCGCCTC-GCCTC7GAGCTATTCCAGAA G7AGTGAGGAGGCTTTT-TGGAGGCC7A 

"""" 1 1 ■ 336O 

Ava I 
Xma I 

Sm a I Bel I 

Qg^ - ■ T "5CAAAAA S C7CCCGGGAGC'7SrATA7CCA7-7 7CGGATC7GATCAAGAGACAGGA7GAGGA7 

- 1 ^ ' ' ' 3q30 

Xma III 

03:7'rG:ArG A7T3AAC^GA-3GA7TGCACG:AGG ":TCCG3CCGCTT3GGT33AGA3GCTATTCGG 

" ' ' 1 ' ' 3500 

Nar I 
Bbe I 

C^TGACr3GGC^ a ACASACAATC3GCTGC7CTGA7GC:5 CCG TGT ~C :33C 7G T: AGCGC AGGGGCGC 

" " ■ — ' 3570 
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Pstl 

j-3GCmGCGCG3CTAT 

36 HO 



CC3G7~:TTT7~3TCAAGACCGA 


CCTGTCC3G"3CCCTGAATGAACTGCAGGACGA3GCAGCGCGGCTAT 


pal 1 


Fsp 1 

Pvu II Tth 1 


^- — '-'-«~/- \ ~ *- A r nr. r r r. — 


C C 7 ~G C 3 C AG C T 3 T G C T : G AC G T T G r r AC TG A A r, C GG .1 A AG G G AC 


*/*t**i*™« TT^rr"'!* ACT^"r^." 


33CAGGATC~CCTG7CATC7CACC'*3CTCC'3CCGAGAAAGTATCC 


A : CA . Gcw i 3m . o^AA I bcJUL'j 


--Tr.r * T A rr,r TTt* AT'" r GGC TACCTGCCCATTCGACCACCAAGCGA 


* at * ~ r .t ~ .* rr/c Artr^A^r Jlfi^ 


AC7CG3ATGGAAGCCG3TC r 7GTCGATCAGGA73ATCTGGACGAAGA 




pSSH 11 


oCATCAGoooL . .ol(3LC^oLv.o 


' <"--r.TTrr-C r ^GC""AAGGCGCGCATGCCC3ACGGC3AGGATC7C 


K)r>n 1 
JNCO 1 




GTC3TGACC:A*GoCGATGCC"'3 


:77GCC3AA t aTCA7G3TGGAAAA7G3CCGC7'7-C73GA7TCATCG 


Nae i 


Rsr II 


ACTGTGGCCSGC T3GG7GT3G-CG 


3ACCGCTATCA33ACATAGCGTT3GCTACCCG73A7ATTGCTGAAGA 


GC TTGGC 33CGA-T3GGC TG ACC 


CCT'CCTCG^GCT" AC3G7ATC3CC3CTCCC3A7*C3CAGCGCATC 




ASU II 


GCCTTC'ATCGCCTTCTTGACGA 


37TC7TCTGAGC3GGACTCTGGGG7 T CGAAA73ACCGACCAAGC3AC 


GCCCAACC73CCATCACGAGA" 


7CGA7TCCACC3CCGCCT"C"ATGAAAGGT7G33C77C3GAATC377 


Nae 1 




77CCGGG AC3CC3GC TGGATGA" 


:CTCCA3CGC333GA*CTCA7GCTGGAGT7C7-:GCCCACCCCAAC7 


TGTT'A'TGCAGCTTATAATGG' 


*ACAAATAAAGCAA:a3CA7CACAAATTTCACAAA7AAAGCATTT' p 7 


Bsm 1 


Sal l 


"tcac"g:a:-c tag~ t gtgg~~ 


*3TCC AAAC'C-TCAAT3 r A7CTT A*:aTG*C"3 t ATa:CG7C3ACC 


•CTAGC"A3AGCTT3GCGTA^" 


AioG"A"f AGCT3""C:"G73T3AAATTG- "-TCCGCTCACAAT'C 


C AC AC A AC AT AC 3AGCCG3AAGC 


A 'iAA3 T 3 "AAAGCC " 33GG7GCC 7A - T3AG" 3AGC 7 AAC 7C AC A • 



3710 
2760 
3850 
3920 

3S90 

<*060 

u;30 
"200 

"270 
"340 

"ii 10 
^"60 

^550 
i-620 
^690 
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<-370 

51 10 
5 : 60 

5250 



Pvu It 

AArTGCGTrGCGCTCAC-3:::2:TTTCCAGTC5G 3AAAC::GTC3TGC:AGC-3CA7-AATGAATC3GC 

— ' ^ ' ' 1 ■ £760 

CAACGCGC3GGGAGAGGCG3 3C37AT7GGGCGC7C77CCGC TTCC'CGC TCAQ7GAC 7:GC~3C3C- 

~ " ^ 1 ' — -630 

IG37CG77C3GC 73CGGC3AGC337A7CA GCTCAC7CAAA3GCGG7aa7acGG A7QCAC~GAA7CAGG 

' 1 : 1 ^00 

GGATAACGCAGGAAAGAACa-.3-3AGCAAAAGGCCAG CAAAAGGCCAGGAACCG"AAAAA3GCCG:3T"G 
CTGGCGTTTT-CCATAGGC:c::3:CCCCCTGACGAGCATCACA AAAA'C3ACGC":aaG"CAGaGGT3GC 
GAAACCCGAC AG 3 AC 7aT.i.:.:Ga7ACCAGGCG777 CCCCC7GGAAGC7C 33 TCG "3CGC ~C7CC~577CC 

GACCC-GCC3CTTACCGGA7^C:T3TCCGCC"TCTCCCTTCGGGAAGC3 T3GC3:T"CT:AATGCrCA 

ApaL I 

CGCTGTAGGTArcrCAG'7:G3-3TAGGT;GTTC3CTCCAAGC-33GC-3T G-GC^CGAACCCCCC37TC 

AGCCCGACC3CTGC3CC-'-l-::33TAACTArCGTCT-GAG7CCAACCC33TA AGACACGACr-ATC3CC 

~~ ' ' " — " -J 20 

Afwn I 

ACT3GCA3CAGCCAC7G'3 7-AC.A33A 7TAGCAG AGCGAGSTA'GTAGGC 3GTGC "-C A3A37~C~T3AAG 

' " ■ ■ 539C 

:GGTGGlCTAACTACGGC"^CAC7AG AAGGACAGTAT7'337A7:T3CGCTC-GCT3AAG:CAG7TaCC 

" ■ - — — ■ •> — Zl Zl — : 5U6O 

y C3GAAAAAGAGTT5G"A3:"C"73A7 CCGGCAAACAAAC:ACC3:TGGTAGCGG:3G~^TT— 5TT7G 

1 1 ■ , 5530 

CAAGCAGCAGa:TACGCGCAGAAAAAAA GGATCTCAAGAAGA7C:TT-GATC"" r:-AC33GGTCT5AC 

— ■ 1 ; ■ 5600 

BspH I 

GCTCAG:S3AAC3AAAAC7:aC5TTAAG GGAT7TT3G-CATGAGATTA":aaaaaGGA-CTTCACCTAGA 

— ' 1 — ■ • ■ 5570 

7CC7T^7AAATTAAAAA-3^AG"777 AAA7CAATCTAAA3TA-ATATGA3TA^ACT7GG7CTGACAGT'A 

' — ' — 1 1 1 «">qn 

CCAA:GCTT^A7CA3TGA33CAC:: ATCTCAGC3ATC"7:-ATTT:G-7:A7»"ATAr--T3rcr3AC-C 

r " — — ■ — — 5sio 

C C 3 T C G 7 3 7 AG A T a A C -J c - " - C ^"h^'r, *i" T T ' rr-: T — r- r. r * - r - ■ . - - * ▼ . - 

~ - — — : 555C 

^CCACGCT:aCC35C7C:^a-7TA^CAGCAATAaaCCA3CCA3C:GGaagGG-.:3A.3C3CaGAA G7GG 

g < 7 ' : ^ >_ 333 a- 33 t i 3 7 a ag t-g"~c3cc- 



c:tgcaac7'-_7:cgc;7:: ■ 7::ag7:ta^7j _ 
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Fsp I 

G'TAATAGTT'GCGCAAC 3~~" T3CCATTGC7ACAGGC A7CG7GGTG~CACGC~CG TCGTTTGG TATGG 



C^TCATTCAGC'CCoG"*:: 


:aaC3A7CAAGGC3AG7"ACA-GATCCCCCATG7T3TGCAAAAAAGCoG7 


~AGC TCCTTCGGTCC'CCGA 


Pvu 1 

~:G7T3~CA3AAGTAAG"3GCCGCAG~GTTATCmCTCA7337TAT33CA 


GCAC TGCATAA'TCTCT-AC 


Sea 1 

"3"CATGCCArCC3TAAGATGC7-TTC7G TG AC TG 3 TG AG TAC TC AACCA 


AGTCA77CT3AGAATAG7G7 


A"GC3GCGACCGAGTTGC7C7"GCCCGGCGrC AAT ACGGGATAATACCGC 


GC C AC AT AoC AG AAC T~~AA 


-AGT3C7CATCA7rG3AAAACG77CTTCG3G5CGAAAAC7:TCAAGGA7C 


*"ACCGCT3T"3AGATCCA3 


ApaLI 

""C3A7G7AACCCACTCG*3CACCCAAC7GATC7-CAGCA7CTTTTAC7" 


"CiO:AGC3r":T3£G~G- 


}CAA&AAC AG3AAGGC AAAA7GCCGCAAAAAAGGG AA7AAGGGCGACACG 


GAAA"C- TGAA'ACTCA'iC" 


Ssp 1 BspH 1 

':"rCC7TT7TCAA7A77AT7GAAGCAT"TATCAG3577ATTG7CTCATG 


AGC3GA7ACA"a77TGAm73" 


■^"T7AGAAAAA-iAACAAA7AGGGGT"CC3CGCACA7T7CGCCGAAAAG 


Sal I 

~GCCACCT3ACG7C3ACG3-~ 


Bgf It Sat 1 

C: 33 AG A TC T CC C 3A _ :C C :'A"3GTCG AC TC'C AG 7 AC AA7C7GCTC 


GA7GCCGCATAG7TAAGCCAG 


Alwn 1 

"ATCTGCT::C"GCT"G"GTG*7GGAGGTCGC'GAGTAGTGCGCGAGCA 


AAAT 7 r AAGC AAC AAGGC 


AAGGC 7"GACC 3 ACAA"~GC A~GAAGAA TCTGC TAGGGTTAGGCoTT"* 


Nru 1 

*gcgcgc7t.:g;ga-g-ac3 


Mlu 1 Spe 1 

GGCCAGA T AT AC 3C3 *"GAC A""GAT ~A TT3AC ~ AG 7 7A 1" TAATAGTAA 


'C AA " *AC G 3C-3 7 : i " * i 3 7 ' 


2ATAGCCCArA~A TGG^GTTCCGCG F~ AC AT AAC TTACGGT AAA TG3CC 


CG CC "GGC 7 3 -C C 3 CC C - - 2 


-•::::cG:::A'T3^L.:-cA,:-iiTGACGTA7G--:cc-TAG7AACGCC 


^ATAGGG-C T"CCA "*G ACG " 


C -ATGGGT3GACTA " _ ACGG" AAAC "GCCCAC TTGGCAG 7aC ATCAA 


Nde I 

G7GTA7CATA-G::iAG"AC3CC:CC7MTTGACGrCAA'GACGGTAAATGGCCCGCC7GGCATTATGCCC 



;C90 
5 '60 

6220 

620C 

6370 

euyo 

5510 
5580 

50 
c?20 

5790 

860 
930 

COO 



C 10 



7280 
AGTACA7GACC" 
7792 
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Sstl 

"A^AGCAGAGCTCTCTGGCTAACTAGAGAACCCACTGCTTACTGGCTTATCGAAATTAATACGACTCAC 70 

Ppal Hind III j p am hi . Spe I ; Xma III Ec ofl I 

TATAGGGAGACCCAAGC ^^^^" a ^C3AGCTCGGATCCACTAGTAACGGCCGCCAGT3TGCT5GAATTCTG I ho 

R P p V C V M S 

P9' « Mlu I 

CA6A7CTT6GC:ATCAAATTG5T3AACrTC5ACGC3TCATTGGAGACTCCACAACCArGATAACCA3CCA 210 
- ocLRRV IG0ST7m;tsh 

TCCAACTGACATTCTTACTTCC-CAACTACAATCCGAATGTTCATGCACGGTGCCG^CAGAGTCGCoTA 280 
prn' | L r--T - ,a ' V! " MHGA -C$R7 
^ O'LTc iT iPM -MSGAiOSR -; 

bACAGTCTGGTCCTTGArATGCTTCTTCCAAAGCAAATGATTCTCCAACTCGTCAASTCAATTTTGACAG 350 

0 S - ^ - 0 M - !- = K 3 M ! L Q L V < S ;' ^ - 

Bbv II 

AGAGACGTCTGGT3T7AGCTGGA3CAACTGGAATT3GAAAGAGCAAACT3GCGAAGACCCT3GCT5CTTA «20 
tR,L/L A3AT Si3KSK!.AKTt. , ' , v 

ER: ' L ' /| - A SATGI5i<SKLAKi : 11" 
* su " Bsml 

TGTATCTATTCGAACAAATCAATCCGAAGATAGTATTGTTAATATCAGCATTCCTGAAAACAATAAAGAA «90 

v s ' 3 7 ^ c i -: o s : v n : s i p -: N n k -- 

7 S 1 " T N C S £ 0 S r V N ! S- I P £ " N m K £ 

Nsf.'" Xba, 

GAATTGCTTCAAGTGGAACGACGCCTGGAAAAGATCTTGAGAAGCAAAGAATCATGCATCGTAA7TCTAG 560 
tL - 0VER; 'LEKlLRSK£SC:vlL 
EL - 0VER:> i-EK:iLRSlCESCIVIL 

A '*frATCCCAAA a 4A7CGAAT- 3W TTTSTTGTArCCGTTTTr S CAAATGrCCCACT7CAAAACAACGA 630 
° * P •■ N R ! 4 v v S V F A N V P . 0 M U E 

• * " « A r V V S V -- A V V P . Q N £ 

EcoR V 

*G37CCATr7GrA5TATGCACA5TCAACC3ATATCAAArcCCTSASCTTCAAArrCACCACAATTTCAAA 700 

1 P . ! v v c " v N p Y ' ? £ l 0 i h h » f •: 



p f v v : - v 'j p r c i 



E L G I -! H x l 
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ATGTCAGTAAT6TC5AATCGTCTCGAAGGATTCATCCTAC3TTACCTCCGACGACGGGCGGTA6AGGATG 770 
MSVMSNR.EGFILRYLRRRAVED 
MSVMSNR *. EGFILRYLRRRAVEO 

Sstf 

« 

AGTATCGTCTAACToTACAGA-GCCATCAGAGCTCTTCAAAATCATTGACTTCTTCCCAATAGCTCTTCA 3«0 
EYR LTVC M PSELFKI I 0 F F P I A L G 
EYPLTVOVPSELFKI lOFFPlALQ 

EcoR I 

GGCCGTCAATAATTTTATTGAGAAAACGAATTCTGTTGATGTGACAGTTGGTCCAAGAGCATGCTTGAAC 910 
AVN NF IEK TNSVDVTV6PRACLN 
A V N N F it< T NSVOl'TVGPRACLN 
Bam HI 

TGTCCTCTAACTGTCGA7oGA*CCC3TGAATGGTTCATTCGATTGTGGAATGAGAACT-CATTCCATATT 980 
CPl rv05SR£W c : ! ?LWNENrI?Y 
CPLTVOGSREVrlRLVNENF I^Y 

Aft III Bam HI Tth I 

TGGAACGTGTTGCTAGAGATGGCAAAAAAACCTTCGGTCoCTGC ACTTCCTTCGAGGATCCCACCGACAT ^050 
LERVAR03KK TFGftCTSFEDPTD ! 
L E 3 V A R C 3 K '< TrGRCTSFEDPTO I 

Bbv II 

cgtctctaaaaaatggccgtggttc3atggtgaaaacccg3agaatgtgctcaaacgtcttcaactccaa \ 12c 
vskkvpvfdgenpenvlkrlqlo 
vskkvowfdgempenvlkrlqlq 
Jth I Xho I 

GACCTCGTCCCGTCACCTGCCAACTCATCCCGACAACAC TTCAATCCCC TCGAG7CGTTGATCCAATTGC 1 190 
DLVPSPAN5SRQHFNPLESL IQL 
OLVPSPANSSROHFNPLESL IOL 

Bbv II 

ATGCTACCAAGC ATC AGACC ATCGACAACATTTGAACAG AAGAC TCTAATCTTCTCTCGCCTCTCCCCCG 1260 

HATKHOTiDNI 

HATKHOTIDNI 

ctttcc"tatctt:5^acc3G"ac:tgatgatt:c:ca"tt t ccc:ct'tt:cccc:aatttcccagaac -3:c 

Xma I 
Sma I 

CTCCTGTTCCCTT7GTTCCTAGTCCTCCCGGG"GCCGACGCCGAAGCGATTTAAAAACCTTT t TCTTTCC iuoc 
Xmn I 

GAAACATTTCCC A TTGC'C A T'AATAGTCAAATTGAATAAACAGTGTmTGTACTTAAAAAAAAAAAAAAA 14 70 
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Sst I Bel I 

AAAAAAAAAAAAGGGCCCTA7-CTATAGTGTCACCTAAATGCTA5AGCTCGCTGATCAGCCTCGACTGTG '5^C 

ccttctagttgccagccat;-c-ttgtttgcccctcccccgtgccttccttgaccctggaaggtgccactc «6;C 

CCACTGrCCr*"CCTAAriiAArGAGGAAArTGCArCGCArTGrCToAGTAGGTGTCATrCTATrCTGGG 166C 

Bbv II 

GGGTGGGG73GG3CAGGACAGCAAGGGGGAGGATTGGGAAGACAATAGCAGGCA7GC7GGGGATGC3G7G 175C 
GGCTC7AT33C~73TGAG 33GGAAAGAACCAGCTGGGGCTC7AGGGGGTATCCCC ACGCGCCC7GTAGCG 162C 
GCGCA7TAA5C-3C35CGG3 7GTGGTGGTTACGCGCAGCGTGACCGC7ACACTTGCCAGCGCCCTAGCGCC I69C 
CGCTCCTTTCGCTTTC""::CTTCCTTTCTCGCCACG77CGCCGGCT77CCCCGTCAAGCTCTAAATCGG I96C 
GGCA7CCCT7"mGGG7"C:3A7TTAGTGCTTTaCGGCACCTCGACCCCAAAAAACTTGA7TAGGGTGATG 203C 
Ora III 

G7TCACGTA3~333ZCA*C3CCCTGATAGACGGTTT7-C3CCCr7TGACGTTGGAGTCCACGTTCTTTAA 2 TCC 
"AG7GGAC73 3TTCCA--C"G3AACAACACTCAACCCTATC~CG3TCTAT7CTTT7GATTTATAAGGG 217C 

Xmn I 

ATT7 T GGGGA""TC3GCC"^""GGT7AAAAAATGAGC7GA7T7AACAAAAAT77AACGCGAATTAATTC7 22^ 
G7GGAA7GT3"3 7CAG7"-3GG73TGGAAAGTCCCCAGGCTCCCCAGGCAGGCAGAAG7ATGCAAAGCA7 23 1C 

Ava III 

Nsi I 

GCATC"C AA7* A3TC AGC--CCA3GTGT33AAAG7CCCC AGGC^CCCCAGCAGGC AGAAG TA7GC AAAGC 226C 
Ava III 
Nsi I 

ATGCA7CT:aa"TAG7CA3CAACCATAG7CCCGCCCC7AACTCCGCCCATCCCGCCCC7AAC7CCGCCCA 2C5C 
GTTCCGCC:^"C7:CGC3:CAT3GC7GAC7AATTT77TT7A7T7ATGCAGAGGCCGAGGCCGCCTCTGC 252C 

Xma I 

Stu I \ Sma I 

I i i 

C7CTGAGC T A "~CC AG AAGfAGTGAGGAGGC 77 TTT7GGAGGCC 7AGGCTTT7GCAA AAAGC 7CCCGGGA 259C 

Bel I 

GCTTG7ATATC:ATT7*C33ATCTGA7CAAGAGACAGGArGAGGATCG"TTCGCATGAT7GAACAAGATG 266C 
Xma III 

GA7TGCAC3:^33TTC"C:3GCC3C77G33TGGAGAGGCTAT7CGGC:aTGACTGGGCACAACAGACAAT 27:c 

Nar t 
Bbe 1 

CG3C"GC7C7 3AT3C:g::3"G~TCCGGC73'TCAGCGCA3GGGC3CCCG3 7TC~"TT7G7CAAGACCoAC 2cCC 
C737CC jGT3C*CT3AA"3-AC"3CAGGAC3AGGCAGCGCGGC7ATCG"33C "GGCC ACGACGGGCGT7C 2870 
Fsp I Tth I 

C 7 T3CGC AGC "3 T3C 7C 3-CG77GTC AC TGAAGCGGGmAGGGAC T33C "3CTAT73GGCGAAG7GCCGGG ZI^C 

gc-gga:ctcc"3t:atct:accttgc7cctgcc3agaaagtatcca-catggc7 3atgcaatgcggcgg 30*0 
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CtSJtS^ 3080 
CTC *CA , GGAAG.,oG7C TTG , C3A7CAGGA7GA7C7GGACGAAGAGCATCAGGGGC TCGCGCCAGCCGA 3 ! 50 
BssH (I 

«TG7TCGCCAGSCTCAA6SCGC3CATGCCCGACGGCGA33ATCTCGTCGTGACCCATGGCGAT6CCT6C 3220 

Rsr II 

Ir™^^ 3290 
JScc : -S-~;™J a : G5CTACCC6TGATAr7G = T GAAGAGCTTGGCGGCGAATGGGCT3ACCG 3360 
UCC tGTGL ' TACGGT - .CSCCSCrCCCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGACGAG ^30 

Asu II 

cIaJ?™---^™^ 3£C0 
""1:1^::%^ 3570 
CTC^GCSC-uc -ATCTCATia. . 33AGTTCTTCGCCCACCCCAACTTGTTTATTGCAGCTTATAATSGTT 36H0 

Bsm I 

^AAATAAAGCAArAGCATCACAAArTTCACAAArAAAGCATTTTTTTCACTGCATrCTAGTTGTGGrTr 3710 

•Sna I 

2I^ AAAG ^ A ~f AA J G ^"^^^ -TCA7GTCTG 7A7ACC37CGACC7C7AGC7AGAGC77GGCG7AA7CA 3760 

. AAAG G 7AA^„ TGGGGT.c, TAATG AG7GASC TAAC TCACATTAAT TGCG77GCGC7CAC7GCCCGC 3920 

Sbo I 

clll C --C''rrt^ 3SS0 
GuoCGC i»T iCCGv. : ■'--TCGC7CAC73AC7CGC7GCGC7CGG7CG77C3GC7GCGGCGAGCGG ^060 

All III 

Ika^^ <"30 
CCC A C-cS cZTrTrll " AGGAACCGTAAAiAGGCCGCG ' TG CTGGCG7TTTTCCATAGGCTCCGC «2C0 
A?c2c«S«---J-5 ^ AAAA T TCGACSCTCA « TCAGA5 G-GGCGAAACCCGACAGGACTAT A AAGA7 a 2 70 

A " AG ^ GTr /_:::E; GG ^ B3110 

-..CC^CTTTC CooGAAGC37GGCGC77TC7CAArGC7CACGC TGTAGG7A7C 7CAG7~CGGTG (1010 

ApaL I 

rA GS7CGTTCGC7:CAAGCTGG6CT 5 TGTGCAC3AACCC:CCG7rCAGCCCGACC3CTGCGCC7TATCCG UH80 

Alwn I 

:?:"^;i"^:;;^ 7 ^ usee 

•-Ilr- -r^::: rG AGGC3Gr3CrAC - 5A G7rC7-GAA G rGG7GGCC-AACTACGGC-ACAC7 C620 

«6so 

a"*!^ ^rrr^"- C - A :; G t; 33TAGC3G7G3:TTr " TT '"" TGCAAG CAGCAGA7-ACGCGCAGAAA «76C 
AAA«GumTC7l --oAmoA TCX • T5ATC TTTTC 7iCGGCG7C"£ ACGC FCAG TGGAACGAAAAC 7CACG ' M830 
BspH I 

"*»GG-3ATTT-G3T:ATGi3A7T 4 rCAAAAAGGATC77CACC-AGATCCTTTTAAAT'AAAAA-6AAGTT «900 
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™£gc^ 4970 
v. i o I l i ».T T --TTCAT.CATAST.SCCTGACTCCCCGTCGT6TAGATAACTACGATA 50U0 

fpa l 

Pvu ( 

-^**frr Gr7ACATGATCCC::CA -rTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCf- «en 
-GTiAGAAG i ^"^~2GCCGCAu7G77ATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACT5^ SS 

= Sca 1 Sbo I 

SaC^ £530 
-Cc-ACCoAG. .^yCLC«.5 •C"TACG S GA7AArACC3CaCCACATA6CAGAACTrTAAAAGr 56CO 

GCTCATCArTGG,AAACGT7C--:3GGGC3AAAACTCrCAA ^ 

cmCCA»,oCC-p-AA G^.C GCm- A A A«G jo AA 7AAGGGC3ACACGG AAA7G77GAATAC TCA7ACTC77 «=810 
.= sp 1 BspH I 

-AAA-ATAAA.-AA.AGG6U, - CG C-ACATT7CCCC3AAAAG 7GCCACC7GACG7C3ACGGATCGG 5950 

Alwn I 

— CC - T -C.T-TG.GTTGG-v,TCG.T3AGTAGTGCSCGAGCAAAATTTAAGCTACAACAAGGCAAG 6090 

Nrul 

GCTTGACC3ACAATTGCATGAAG AATC73C77AGGG77AGGCG7T77GCGCTGC 7TCGCGATG7ACGGGC 6160 
Mlu I Spe , 

a^CCa^a^ 6230 

C,^..„T ( CA.o.CAArAA. u A.G.AT3 ( -CCCA7AGUACGCCAArAGGGAC77TCCA77GAC37CA 6370 

Nde I 

■t .-.o c,,AC.o ^^^,.. J CCTG3CArT,-GC::AGTACA T GACCr^T3.3.3ACT7-C 651C 
SnaB I 

7m£c£c^ 5650 
G7AGGCG7GTACGGTGGGA^ rGTCGTAACAAC TC CGCCCCAT7GACGCAAArGGGCG 6720 
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S7£ 

GGTCC7GCAAC777A7CC3CC7CCATCCAG7CTA77AA77G7 7GCCGGGAAGC7AGAG7AAG 7AG77CGC 

z ~~ 1 - ' — 1 — ' t— — ' — y 70 

Fsp I 

CAGTTAATAG^TTSCGCAACG— STTGCCATTGCTACAGGCA TCGTGGTGTCACGCTCGTCG7TTGGTAT ■ 

' 1 1 ' — # ao 

.im, 'V^wiULbttl ICCCAACGATCAAGGCGAG7 TACATGATCCCCCATGTTGTGCAAAAAASCG 

— — — — — — — x — 1 ' 210 

: Pvu I 

GTTAGCTCCT-CGGTCCTCCGATCorTGTCAGAAGT AAGTTGGCCGCAGTGTTATCACTCATGGTTATGG 

" ' ' ' ' — 280 

Sea I 

CAGCACT3CA7AATTC7CTTAC-GTCA7GCCA7CCGTAAG A 7GC 777 7C 7G 7G AC TG G7G AG 7 AC TCAAC 

' — 1 ' « - 350 

CAAG7CA77C7GAGAA7AGTG-A7GC GGCGACC3AG77GC7C77GCCCGGCG7CAA7ACG^ATAATArr 

— ' ; ' 1 — L20 

^^CCACATAGCASAACTTTAAAAGTGCTCATCATTGGA AAACGTTCTTCGGGGCGAAAACTCTCAAGGA 

' ' ' ~- 1 ' 1 ■ «90 

ApaL I 

T CTTACCGCTG7T3AGA7CCAG7TCGA7G7AACCCACTC GTGCACCCAACTGATCTTCAGCATCTTTTAC 

Zl ~ ' ' 1 *— ' ■ 550 

■TCACCAGCG777C7GGG7GAGCAAAAAC AGGAAGGCAAAATGCCGCAAAAAAGG3AATAAGGGCGACA 

630 



Ssp I BspH I 

CGGAAA7G77GAA7AC7CATAC7C77CC77777CAA7A77 A77GAAGCA7T7A7CAGGG77A77G7C7CA 

Z ' ' ' ' — — 700 

? GAGCGGATACATA777GAA-G7AT77AGAAAAA7AAAC AAATAGGGG77CCGCGCACAT7TCCCCGAAA 

' ' ' 1 770 

jSall Bglll sail 

A GTGCCACCTGACS7CGACGG ATCGGGAGATC7CCCGA7CCCC7ATGG7CGAC7C7CAGTACAATrTRfT 

' ■ ■ — = euo 

Alwn I 

CTGA7GCC3CA7AG7TAAGCCAGTATC7GC7CCCT GC77G7GTGTTGGAGGTCGCT3AG7AG7GCGCGAG 

~ ~ 1 — ' ' 1 — J ~ ■ gjo 

AAAA >TAAGCTACAACAAGGCAAGGCTT3ACC3ACA ATTGCATGAAGAATC7GCr-AGGG77AGGCGr 

~~ ' ' 1 - J — — 980 

Nru 1 Mlu I spe I 

"T3CGCTGC""C3CGArGTACG3GCCAGArA7AC3CGTT GACA7TGArTAT7GACrAGTTAT-AATAG: 
~~ ~"* 1 " 1 ~— — 1 1 : — 

AATCAATTACGGGGTCATT^G-^TAGC-ATATAyGGA GTTCCoCGTTACMTAAC-TACGG-AAATGG 

~ ' ' ' ■ \ \20 
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SnaB I Nco , 

CCA5TACAT3ACCrTA7GGGAC'TTCC' A CT7G5C A GTACATCTACG7ATTAGTCATCGrTAT-^^^ 



GTGA-GCGGTTrTGGCAGT.CA-CAATGGGC G-SGATAGCGGTTTGACTCACGGGGAT"CCAAGTCTCC 
AC CC CAT TGACG ^^ AA ^"^^^'^^G^""^7'GG C ACC AAA A rCAACGGG AC TTTCC A AAA TGTCG~AACAA 
£J^^^^~ ''*''' ^**CGC A aa"£G5CGGTAGGCG T3TACGGTGGGA3G7CTATATAAGCA<;A:;r~" , T'*Tr:r: 



'320 



: 540 



'610 



CTAAC-AGAGAACCCACTGC-ACTGGCTTATCGAAATTAATAC^r-r^ 



r Hind (II 
CTATAGGGAG ACCCAAGC77 



• L i N 3 L ^ T G - 
Asp 718 6am HI 

K P nl Spel Xmalll 



< L : L T ; g a P < L 

BstX I 

EcoR I Pst I 

GGTACCGAGC7C3GATCCAC "AG TAACG 3CCGC C AG 7G 7 GC 7GGAAT7P inrJr.A rrr.rr ^ 



£ L 



S N 3 R 3 c A 
Ava I 
Xma l 
Sma I 



:680 



!750 



° • - 0 I A A A 



CAACCrTCGSACAACATTCGC-AASArcCCCGGSATATT/ 



Pvu II 



A TCC T A T 7C TCC ACAC TTATCAG'G TCAGC 



S P H 



S V S A 



5 ■ r G 0 H S . R S P G Y S S 

Spe I 

Sal I 

TGATAAGGACACAATGTCU-GCACrCACAGACTAGTCGACGACCT 

* "-S O S R R ? c c c k s . 

<-C T.^.T-C «. , a , aaA 3C:ACCTTCAAG A 6TT: A c A - = = . ccgAfifArAa ^-,.-. p ; r . 



TC T TCACAAAAACC AAGC *" AT TC A 



!820 



390 



C £ " 
0 E r 



* 960 



h R v a 

h R V £ 
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2030 



Ava I Bam HI 

CTCTCT-3A6CCC3AGAC5GGTGCC3AACTCGAT3TCGAAATATGATTCTTCAGGATCCTACTCGGC3CG 

AL.SPRRVPNSMSKYOSSGSVSAR 
AL.SSRRVPN-SNSKYOSSGSySar 
Ava I 

"7CCCGAGG7GG AAGC'C T^C~G37aTCT~TGGASAGACS77CCAAC7GCACAGACTA7CCGA7GAAAAA 

SRG3SS r G I YGETFOLHRLSOc< 
SRG3SS~G I YGETFQlHR(_SDE< 

Bam HI Nde I 

I I 

"CCCCCGCACATTCTGCCAAAAGT3AGATGGGATCCCAACTATC ACTGGCTAGCACGACAGCATAT33AT 
' "~ ~ — ' — — 1 — — ~" 1 1 1 ' 2 1 70 

spahs akse vgsolslast'ayg 
sp ^hs-ksemgsclslasttayg 

Sail 

C7CTCAATGAGAAGTACGAACAT3CTATTCGG6ACATGGCACGTGACTTGGA6TGTTACAAGAACACT-GT 

. . . , 1 , , , . 2210 

sl x je< v ehairomardlecyk^tv 

3 L *J E < v E H A I R D M A P D L E C Y K N 7 v 

Hind III 

CGAC'CACTAACCAAGmAACAGG-3AACTA7GGA3CA7*GTTTGATCT7TT7GAGCAAAAGC7TAGAAAA 
. . i ■ ■ 2310 

35L7K<GENY3ALFD : -FE0KlR< 

DSL TK<C£NYGAL. tr 0LFE3KLR< 

Cla I 

C7CAC"CAACACATTGA7CGA7CCAACTTGAAGCCTGAAGAGGCAATACGATTCAGGCAGGACATT3CTC 
— ■ l- . , , 2380 

L70HICRSNLKPEEAIRFRQOIA 

-*0H ICRSNLK^EEA i 3 F 3 Q D I A 

ATTTGAG3GA7ATTAGC4ATCA7C7TGCATCCAACTCAGCTCATGCTAACGAAGGCGC"GGTGAGC7TC7 
. . ■ ' ■ i 2^50 

t-L^C ISN-ILASNSAHANEGAGE-L 

HL^D I S N H L A 5 N S A H A .N £ G A G E . L 

Cla I Cla I 

~C3TCAACCA T C 7:TGGAArCAGTT3CATCCC '*TC3A7CATCGATGTCArC3'CGT;GAAAAGCA3CiAG 
— . . . 1 . ; ■ 252? 

n C P S L i 5 A S H R S S M S i S 5 k S 3 < 

5 P S L £ r V A 5 H R S S M S 3 S : K 3 3*' 
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Bam HI 

CAGGAGAAGA7CAGCTTGASCTC3TTTG5CAAGAACAAGAAGAGCT GGATCCGCTCCTCACTC7CCAAGf 

"' ' 1 ' — — 2590 

3E<: $L$S' r GKNKK:SVfRS3LSK 
0E< ISL ^5F3KNK<SV!RSSLSK 

We I BspM II 

'C"CCAAGAAGAA3AACAA3AACrACGACGAAGCACA7A73 CCA 7C AA T 7 TCCGG ATC ~C AAGGAAC7C~ 

— ' 1 1 — ?660 

FT<k<NKVY5£ahMP5I sgscgtl 

ApaLI 

"GACAACATTGArGTGATT3AG"7oAAGCAAGAGCTCA AA3AACGC3AT&GTGCACT^ ACSAAGTCCGC 

~ " ; 1 ■ ' — 2730 
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Figure 18 : Phase contrast images of MCF-7 cells transfected with pCB201 (upper) 
compared to mock i control) transfected MCF-7 cells (bottom). 

The control cells are spread out on the tissue culture plastic and exhibit few 
filopodia outgrowths. The transfected cells appear smaller because they are slightly 
rounded up and have multiple filopodia outgrowths (arrowhead) per cell 
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Figure 20 : F- aotin pattern (visualized with TRITC-Phalloidin) of 
MCF-7 cells transfeeted with pcDNA3.LacZ (top panel) and with 
pCB201 (middle and lower panel). 
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Fiaure ">"> ' Phase contrast image of N4 neuroblastoma cells, transfected with 
pcDNA3 (22a). pCDU4 (22b). pCDU3 (22c) pCDU2 (22d) and pTB72 (22e) 
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Figure 24 : Phase contrast images of small medium sized and 
large foci induced in a monolayer of NIH-3T3 cells by 
transfection with pCB201. 
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Figure 25a. b, c: Chromosomal localisation of hu-unc-53/I by FISH 
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Figure 26 : Expression pattern of Hu-unc-53/1 and Hu-unc-53/2 in normal 
human tissues and cancer cell lines by Northern blotting. 
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n"CAGAAATCACACATTTAAAATTTlTAAATTTTCCTCAA^ 5^ 

TTTTCTCAATTTTTTTCCGGAAAATXCrrGAATTTTTTGAT 57M 

TTTnCAAJWCTAG^TTTATTTGTGAGTTTTCAAAATATTT^ 3fl08 

GAATAAATTAAAATGTATTTTAJIAATATGTTTTGTAAATTCACAVU^ 5900 

AATTTAfiAAATTTTTATAGAGTATTATTTTATTT^ ^ 
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fg 27 pf*P3 Map (1 > 13621) Site iAd Sequence _ -— — — — 

GAACCn'CTACA^lAAAATAATTGTACT^ 6100 
TCACCCWCCCUVAACATAAGTAAAGTTCACAAATAAACCTACC^TTTCTC AACAACTCCGATTCTTGGCTCCGTG 6200 

TUCCT«KGGAACA*CTeGA«GT^ 6300 

AATATACKAAGOTCnAGGAAGCCGAAGCtt^^ WW 

OOJUlAATCUUUC^CCCJUWGCCTATTCJUmCM 6500 

TGCAACTACTCGGGCCAAACMTATGTACATGGTTCAGGCCGGC^^ 66G0 

AATTTAGCXCAAACTTGACAAGAAGCCTCTACCCAG^^ 67W 

TawCTOCTaTJUKTCTCAttATGGAaCOT 6880 

TATAGMGGGGACGGTTTCAAATGGGATAAATTGAAACT^ 6909 

HI I At I JUM IV.UIMHUV. I n.n.MVWWw.^nwww!. .... — 

AAACATTTTTTCCTWVATATCTTAAACTTTTAACATCAAT^ 7188 

CCT AAAAATCTATAAAaA<WTCC7CTKA«5 7298 

AAttTCCCJ^ATCTOUUXTr^ 7388 

TCUTCATWCOTMCGACCTGaOTCAATCC^ 7488 

JGCGTCCOTTTTACGGTAGGAAACTrrTGGCTGGAAATTACTATGTAAAAAAC^ 7S *> 

TCATAAATMTAGAAGCCATAAGACAATTCCGGAAAGTCAAAAGAATCTCGTATAT^ 7688 

M ™ T ATGAAAAAAATAA»TAaAGAAAa<UAAMCT^ 7708 

AaCGTASTAGCTCATTmWAAACWTOU^ 7880 

CGACin'CGAAAAATOGrCCGAATATCACTTCCCTTGCAACATTTn'CCCCCAAGACTCTrrATCCGCCfiACACAACGACAA 7S8B 

TflUJUULCTatttOttQUUfflU^^ 8888 

CSGAAGAACTACGAATCGGGTTCTAAATATAGATtoMCCCAGAGCTrCGCAAAA ,188 

CTCACn"GTCTTTGGTCCTATACAAAAATTATCAACTGTUUttATAT^ 4298 

CrTCAAAGTAAjUTAGTGGGATGCAATACTATCAGAGGAAAAAATCTTATTTTTM •*» 

TTCCGATTrCAAATACTCACCGTACAAAGCTTCMCATAGACGTGGCACTTCCCAGC^ ««" 

nTTCGACAAAGTTTSTTGTTGAGGCACCGCCGCCKCTTTCTCGTAGGTTGTGGGCT 8580 

mUGCATTCCACCACaCUCUCCGCTinTATCCG^ * M * 

GCGGCTAGCTTTGAGCTTCCGATTTTTGTACTTGTAGCMCACIWCTACCTGGGTT^ 8708 

SjnTCGAAATAGAGCTGTACffrTGATCATGATTCTGAJWTM »«« 

TATCCTCGAGCCAACATTATTATTGCCACTCGAACGGCTCGACGGACGGAATGAATTTGTATTA^ WW 

CCAGACCTOTrGGCTTGATACCAATCTnGATGAATCTCTAATTGAAA 9889 
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fi9 27pNP3Map <l>U82n S» . Ma Soou«na> Pafl«« 

AAGTAA(^GAAACTGGAAACATTTAAAATTGAAAACTCTCTTGGA AA TCTAG7CTGGAT7AGACTGAAGCAAGC AAGTCAATCTAGGCTTA^UWTAAAT 9100 

GCAACTCATTTCTCCn , CCKTTMGCTT*TGCCCA*CTAAG*TCrAGAATAA66CTATA<rnCACCCTT*CCAT^ 9209 

CrAWTTTAn^GCCACTTCACTTrrCTCTr(>CCCCAATCTTCCCTAAACATCTTCAAAACCTT7AAATACA^ 9300 

TTTCrTTTTTTCTrCTTTTCCAAATCTACATCTACM 9400 

TAAATATAaAAATmcCCacmWCCaaCACAAAAAATAGGCWGAGTCAAAACGAACW 

AGTAT6AGC<rT(UATTTGA6ACC6AACGGACGAGATGAAAAGG<n"CACACGCCATGr<iGCTCAGTATCTCCTTITCTrAAAAA4 9600 

CTATACnTrTATGATACGAGTGAGATGGATCAAGAGCrGAGAAAGTTTTTrAATGGGtCAAAAAGCATGGAAAAT^ 9700 

AGATTGAATTrGATAMTAATCTCTCCTFGAATTAAAATrCTAACTGCTTGCACTTTATTACGCGTGCCCAATAATTATX^ 9t00 

GGATATGTGCTGAGTAGCGATTCATAGCATGGGAAATGTGAIUCGGAGCTTCO 9908 

ATAAAGATATTGTtAACAGTAGAAATAAAATTTTAGtCATACAATrTGTTAGGGAATATTTATlGArrrrT^ 1B8W 

AATATCCrTTrrrrAAATAAACTTTTAAAACTCGTGTTTAlTCOAATTGCACAAAAT^ 1010? 

CCACTTTGTGCTAACCCAGAATGAGTGGCGGAATAGGAAAGAGCGCATAAAACCCGACATTAATTGTCAGTGATGAGaGCGGGGGAAG^ 1020C 

GTIGAGACGATGAAAAAGCAArrCTACACTCATACtAOACCAKCAGTGTCATCGGOCTATCTTTAn'CATCAATTTCCGAW 18J « 
TCTCAUCGAjrrAAnGACCGTaCeCCCTCTAGm^ 

GAGATGGA^GGUGGAGGAGGCGGUGAAGGAGGGAGAAWGGGACWCCTGTCCCTGTCAA^ 

AAAAGTCATTAIGAAGCCGAGAAATTGATTrrGGTGGGCACTTrrCGGGCAAGGGGAGCCGATTTGTAAATTGCAAMTT^ 10C0t 

MCAT6GTG6C6GTGAG5GTrrGAAATA<nTrrrrTrTACTTTrrCATACAAAAAGGAAGGTTCT 107w 

TGTCrrAAACATTTGAGAATAKACAAATTTGTTTCAGArCATTATTTCCAGTATAGACTTTKGCTCCTTC 1080C 

TTGCATCTATTACAAAGCrCTCAACTGACAAAATTTCTCCAAMTACTTrrCTACATTTGTATAGTGGAAAATrr^^ 109W 

ATATAAGCCGTCATArTAGACCAATAAAGGTOGAAGTTTTGCAAAAAAACTACATTnTATCATCTTCTTTTCTGGGCAT^ 1100( 

AAATAAWTTOTACCAATTTTCGATATTCTTGACTCT6GAGTCTGAAGCCTGGAT6TTGACATTTGTGGAAAG 1110? 

GGTTGCTGACGTGGCGACACGTGGCGAGGGTAATCrGAAAATCKiAATTGTATrGCAACITTTGTAAWTCTAGT^ 11201 
GGCAAATAAGCAAAGTAAATAATGTTCCTrnAATAlTTTCTGWT^ 

TrAAAGMCACATTTTTGTAGTCGTAAACTCAAAATTAAACTCACTTAGAMCCGCGflGTGGCATAArGGATGTGGGTAG^ 1140C 

CTCGAGGGGGGCCCGGTACCCAGCTTrrGTTCCCT/rTAGTGAMGTTAATTGCGCGCTTGGCGTAATCATGGT^ llS0t 

ATCCGCTCACAATTCCACACAACATaCGAGCCCGAAGCATAAAGTGTAAAGCCTGCGGTGCCTAATGACTGAGCTAACTCACATTAATTGC 1160t 

* CT <««<™OlGTCG<*AAACa(^^^ la70e 

TCCTCGCTCACTGACTCGCTGCGCTCGCTCGnCGGCTlXGGCGAGCGGTATCAGCrCACrCAAAGGCGGTAATACCCTrATCCA^ 11802 

ACGCAGGAAAGAACATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGTAAAAAGWCGCGTrCCTGGCGTTTTTCCA 1298! 
GCATCACAAAAAICGACGCTCAAfiTCASAGGTGGCGAAACCCGACAGGACTATAAAWTACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCT 12002 
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fio37p*P3 Map (1 > 1362Q SAt> and Saau9ftcd 
OTCC6ACCa«CGOTACC<tfATACaCTCCGCCTrTCTCCOT 12l« 
AGGTC<rrTCGCTtCWGCTCGGa<TrGTGCA£GAACCCtt^ 122CC 
ACACGACTTATCCCCACTWCAGCAGCCACTQGTAACAGGATTAGCACAGCCAGGTfcTCT 

GGCTACACTAGAJKiGACAGTAT^GCTATOCCGCTCTGCT 124« 
CrGGTACCGGTG G I 1 1 M l T GTTTGCAAGO^GCAGATTACGCGCACJ^AAAAAAGGATCTCAAGAAGATCCTTT G ATCTTTT CT AC G GG GTCTGA CGCTCA X25« 
fn*€CAM!GAAMCTCAC<nTAAGGGAT^ 12«W 
TAAAGTATATATGAGTAAACTTGGTCTGAWGTTACWA 

GACTCCCCCTCCTGTAGATAACTACGATACGGGAGGGCTTACCATCT^ 12 WW 

ATCAGCAATAAACCAGCQ»GCCGGAAGGGCCGAGCGCAGAA&TGGTCCTGCAA<TTTATCCGCCTCCATC^ 

CTAACTACTTCGCCAGTTAATAGTTTGC GCAACCTTCTTCC CATTGCrACAGGCATCGTGCTCTCACGCrCGTCCTTTGGTATGCCTTCATTCAGCTCC G 
GTTCC G^ACGATCAAGGCGAGTT ACATG ATCCCCCATGTT GT GC AAAAA AGCGGTT AGCTCCn^CGGTCCTCCGATCGTTCTCAGAAGTAACTTGGCCGC 13106 
ACTCTTATCACTCATCCTTAT6C CA GCACTGCATAATTCTCTTAO CTCATGCCATCCGT AAG ATCCTTTTCTGTG ACTGGTGACTACTCAACCAAGTCA 
TTCTGAGAATAGTGTAtGCGGCGACC&A(rrrCCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCC^ 13345ft 
CAAAACCTTCTTCGGGGCGAAAAC fCT CAAGGATCTT ACCGCTGTTG AG ATCCAGTTCGATGT AACCCACTCGTGC ACCCAACTG ATCTTCAGOTCTTT 134CC 
TACTTTCACCAGCGTTTCTGGGTCAGCAAAAACAGGAAGG^^ 1350C 
CTTTnCAATATT AT^ W AGCATTT ATCAGGGTTATTCTCT WTGA GCCG ATAC ATATTTGAATGTATTT A GAAAAATAAACAAATAGGGCTTCC GCGCA 
CATTTCCCCGAAAAGTGCCAC 13621 
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SIGNATURE SEQUENCES 

DTfJeren, signatures used ^ ^ ^ ^ ^ ^ 

X m £,Zl idS , ? ted in one ,e ""- code 
X equals any aminoacid 

A (3,5) equals 3 to 5 X's 

(D,E) means D or E at a given position 

l ° — 8 a „e< gh t ma tri.v 

BLOCK A : 
Sx?l Q xf^ 
BLOCK B : 

KXKKSWXXXXXXXXFXK 
BLOCK C : 

VKRTFRPA TFF . Ufn y 

BLOCK D : 
LARCF FAKffl y. 

^ERTERRATFF^ fflY . 

BLOCK E : 

GXXGXGKS/T 
and 

F<K. R) MXXXSN.XO. 8) 0FaL.VXI.L,VH R /K,V ( ,x,VX R .KX R .K, (R ,K«V (DE , 

SLoCt P 

(W/F) (D/E)DSSS(V/L/I)SSGISD(T/N) 
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Enzymes : 72 of 146 enzymes (Filtered) 

Settings: Linear. Certain Sites Only. Standard Genetic Code 

JBgl I 

i 

TAGTTATTAATAGTAATCAATTACGGCGTCATTAGTTCATAGCCCATATATGGAGTTCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGftCCu 

i ' , \ . — — — * i >— — 1 i 1 ■ 1 ' 1 ' — 1 1 : 1 — 1 1 tOO 

ATC AATAATT ATCATT AGTTAATGCCCC AGTAATC AAGT ATCGGGT ATATACCTCAAGGCGC AATGTATTGAA TGCCATTTACCGGGCGGACCGAC TGGC 

LL I V INYGVI SS .Pt YGVPRYI TYGKVPAVLT 

Aat II Aat tl 

I ! 

C'CAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCCATTGACGTCAATGGGTGGAGTATTTACGGT 

i i 1 i ' 1 1 1 1 i ' ' 1 1 1 1 1 1 ' 1 i- 200 

GG3TTGCTGGGGGCGGGTAACTGCAGTTATTACTGCATACAAGGGTATCATTGCGGTTATCCCTGAAAGGTAACTGCAGTTACCCACCTCATAAATGCCA 

AQRPPPIDVNNDVCSHSNANRDFPLTSMGGVFTV 

Bgl I Nde t Aat II Bgl I 

\ I if 

\ .> Ar r^rrr ArTTrrr att Ar ATr a a r; TfZT a TT a T iTGC C A Afi T AC GCCC CCT AT TG ACGTCAATGACGGTAAATGGCCCGCCTGGC AT TATGCCC AG T A 

^-,w.~ ' '"T" 1 i i i i 1 ■ i t ■ ■ ■ I • 1 ■ ■ 1 ■ ■ . ' i 1 ■ ' 1 ■ : 300 

TTTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGGGGATAACTGCAGTTACTGCCATTTACCGGGCGGACCGTAATACGGGTCAT 

NCPLGSTSSVS YAKYAPY R 0 . R MARLAL CPV 

SnaB I Nco I 

CATGACCTTATGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGCAGTACATCAATGGGCGTGGA 

I ) i | I i | 1 I 1 I I ■ t ' t I I ' i '-CO 

GTACTGGAATACCCTGAAAGGATGAACCGTCATGTAGATGCATAATCAGTAGCGATAATGGTACCACTACGCCAAAACCGTCATGTAGTTACCCGCACCT 
HDLMGLSYLA VHLR 1SHRYYHG0AVLAVHQVAV 

Aat II 

TAGCGGTTTGACTC ACGGGGATTTCC AAGTCTCCACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTA 
ATCGCCAAACTGAGTGCCCCTAAAGGTTCAGAGGTGGGGTAACTGC AGTT ACCCTCAAACAAAACCGTGGTTT TAGTTGCCCTGAAAGGTTTTACAGCAT 
I A V L T G I SKSPPH. RQVEFVLAPKSTGLSKMS 

Nhe I Ec47 

I j 

AC A AC TCCGCCCCATTGAC GC AAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGC AGAGCTGGTTT AGTGAACCGTC AGATCCGCTAGCGCTA 

, , I,, i i i i . i | i i - — - — - ( - | ■ - ...... ] . . £ 

TGTTGAGGCGGGGTAACTGCGTTTACCCGCCATCCGCACATGCCACCCTCCAGATATATTCGTCTCGACCAAATCACTTGGCAGTCTAGGCGATCGCGAT 

OLRP tOANGR ACTVGGLYKQSVF3EPSDPLAL 

I, - . ■ - • t i . . — . . j . . ...... ... ^ — . , . , .. 

Nco t 

CCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCG 

1 1 i , 1 1 . 1 > I 1 i ■ ■ I 700 

GGCCA3CGGTGGTACCACTCGTTCCCGCTCCTC3ACAAGrGGCCCCACCACGGGTAGGACCAGCTCGACCTGCCGCTGCATTTGCCGGTGTTCAAGTCGC 



P VATMVSKGEELFTGVVP ilveldgdvnghkf > 

"GTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGC TGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCC TGGCCC ACCCTCGTGAw 

, 1 ■ . i I 1 ■ 1 ' 1 f ' > 1 cC 

ACAGGCCGCTCCCGCTCCCGCTACGGTGGATGCCGTTCGACTGGGACTTCAAGTAGACGTGGTGGCCGTTCGACGGGCACGGGACCGGGTGGGAGCACT3 



VSGEGEGOATYG'<LTLXF ICTTGKLPVPWPTLVT 
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CACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCC^ 



T 1 T - Y G V° C F 5 R y r 0 " " « 0 H D F F K S AMPEGYVOE 



CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTG 



AAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCAT,^ 



tag: 



" ' ' F F " ° ° G N Y K T R ' E V * F E G D T L y M R , E L K G , 
ACTTCAAGGAGGACGGC/UCATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAG. 



AACGGCATCAA 



I tec 



DFKEDGNILGH 



KLEYNYNSHNVYIM 



A 0 K Q K N G I k 



GGTGAACTTCAAGAT CCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCAT CGGCGACGGCCCC 



GTGCTGCTG 



GCACGACGAC 



VNFKIRHNIEOGSV 



QL AOHYQQNTPIGO 



G P V L L 



CCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGC AAAGACCCC AACGAGAAGCGCGATCACATGGTfr 



TGCTGGAGTTCGTGACC2CCGCCGGGA 



AC TGGCGGCGGCCCT 



PONHYLSTQSA 



LSKDPNEKROHMV 



llefvtaag 



Asu II 
EcoN t 



pspM II Bgl II 

TCACTCTCGGCATGGACGAGCTGTACAAGTCCGGACT CAGATCTACGrCAAATGTAGAATTGATACCAATCTA CACGGATTGGGCCAATCGGCACC TTT" 



' TLGMOELY 



-C.e.uncS3 sac 



K SGLRSTSHV ELtPIYTOWANRHL 



f* u 1 EcoR I 

gaagggcagcttatcaaagtcgattagggatatttc caatgattttcg'cgactatcgactggtttctcagcttattaatg tgatcgttccgatcaacgLa 



~C.e.uncS3 sac 
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Bsm I 

i 

f 

TTCTCGCCTGCATTCACGAAACGTTTGGCAAAAATCACATCGAACCTGGATGGCCTCGAAACGTGTCTCGACTACCTGAAAAATCTGGGTCTCGACTGCT 
AAGAGCGGACGTAAGTGCTTTGC AAACCGTTTTTAGTGT AGCTTGGACCTACCGGAGCTTTGCACAGAGCTGATGGACTTTTTAGACCCAGAGC TG ACGA 



-C.e.unc53 sac 



FSPAFTKRLAK I TSNLDGLETCLDYLK NLGLDC 

Ear I 

EcoR V Pvu II Ksp632l Hind III 

1 III 
CGAAACTCACCAAAACCGATATCGACAGCGGAAACTTGGGTGCAGTTC TCCAGCTGCTCTTCCTGCTCTCC ACCTACAAGCAGAAGCTTCGGCAAC TGAA 

, 1 . 1 »- — 1 1 1 1 >— — ' 1 I ■ I — ■ — ' I ■ ■ h ! 70«" 

GCTTTGAGTGGTrTTGGCTATAGCTGTCGCCTTTGAACCCACGTCAAGAGGTCGACGAGAAGGACGAGAGGTGGATGTTCGTCTTCGAAGCCGTTGACTT 



-C.e.unc53 sac 



SKLTKTDIDSGNLGAVLQLL FLLSTYKQKLRQL* 

in * ■ I ■ I ., ■■■■■■ ■ ■ , ■ i ,-J— ,. . 1 ' ' • ■ i. ' i 

Sst It 

1 Xma I 
! ! Apa I 
I 1 I $ma I 

j j|i Bam HI ;(ba I 3d I 

AAAAGATCAGAAGAAATTGGAGCAACTACCCACATCCATTATGCCACCCGCGGGCCCGGGATCCACCGGATCTAGATAACTGATCATAATCAGCCATACC 

1 ! ' 1 ' 1 ' 1 1 1 ' I 1 1- ' 1 1 1 1 h 

TTTTCTAGTCTTCTTTAACCTCGTTGATGGGTGTAGGTAATACGGTGGGCGCCCGGGCCCTAGGTGGCCTAGATCTATTGACTAGTATTAGTCGGTATGG 

" \ 



-C.e.unc53 sac 



KDGKKLEQLPTS IMPPAGPGSTGSR , L I IISHT 

' ■ ■ ■ . ■ I ■ » .- ■ ■ 1 I L. ■ 1 - - ■ ■ ' 

Pra I Bsm I JHpa I 

! I : 

ACATTTGTAGAGGTTTTACTTGCTTTAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATT3 
• 1 1 1 1 1 1 > i 1 1 ■ ■ 1 ■ . i ■ > i .... i -'Z:y 

tgtaaacatctccaaaatgaacgaaattttttggagggtgtggagggggacttggactttgtattttacttacgttaacaacaacaattgaacaaataa: 
tfvevllalknlphlplnlkhkmnaivvvnlf 1 

■ — ■ 1 ■ i ■ i — . . ■ . . . . . 

Bsm I 

j 

cagcttataatggttacaaataaagcaatagcatcacaaatttca caaataaagcat-tttttcactgcattctagttgtggtttgtccaaactcatcaa 
gtcgaatattaccaatgtttatttcgttatcgtagtgtttaaagtgtttatrtcgtaaaaaaagtgacgtaagatcaacaccaaacaggtttgagtagtt 

A A Y N G Y K.SNSITNFTNKAFFSLHSSCGLSKL IN 

Mlu I Ssp I 

i | 

TGTATCTTAACGCGTAAAT TGTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAAT7TTTGT TAAATCAGC TC ATTTTTTAACCAATAGGCCGAAA TCGG 

1 1 1 1 i 1 1 . 1 1 1 1 1 ■ 1 . i , h > | OC 

ACATAGAATTGCGCATTTAACATTCGCAATTATAAAACAATTTTAAGCGCAATTTAAAAACAATTTAGTCGAGTAAAAAATTGGTTATCCGGCTTTAGCC 

VS.RVNCKR.YFVKIP. VKFLLNQL. IF.PIGRMP 



BNSDOCID: <WO 9824810A2_I_> 



WO 98/24810 252/270 



PCI7EP97/06956 



Tuesday, 18 November 1997 10:34 

fig 29 pEGFPsac (1 > 5100) Site and Sequence 



Page S 



Bsrl 



c ***t Tccc : TA ™ AT ^ 

gttttagggaatatttagttttcttatc tggctctatcccaactcacaacaaggtcaaaccttgttc tcaggtgataatttc ttgcaccTgaggTt^a^ 

1 K R t D R p R VECCSSLEQ£ST " ^ 



N P L 



t K E R G 



L Q R 



Dra III 



AAAGGGCGAAAAACCGTCTATCAGGGCGATGGCCCAC 

, , ^ 



TACGTGAACCATCACCCTAATCAAGTTTTTTGGGGTCG AGGTGCCGTAAAnrar Trt a «t.~^.- . 
mCCCGCTTTTTGGCAGArAGTCCCGCTACCGG G TGArGCACTTGGTAGrG G GATTAGrTC AAAAAACCCC.GCTCCA^GCATTTCG;GArr;AGCC; 



QRAKNRLSGRVPT 



TIT LIKFFGVEVP 



S T K S E 



Nae I 




SLTGKAGERGE 



KGREESERS 



G R 



G A 



GGCAAGTGTAGCGG 




pspH I 



Ear I 

: Ssp I Ksp632l 



L - f ^uvj^GATAAACAAATAAAAAGATTTATGTAAG TTTA ATAGGCGAGTACTCTGTTATTGGGACTATTTACGAAGTTATTATAAC 7 



R « P Y L F ! F 



I N T F K Y V SAH ETJTLINA 



:tttttccttc- 

s I I L K K E E 



OxaN I 



Pvu II 



G "CCTGAGGCGGAAAGAACCAGC T6TGGAATGTGTGTCAGTTAGGGTG TdCx&A&KirrrrAfznr 



Sph I 
Ava III 
Nsi I 



CASCAC TCCGCC TT TC rTGGTCGACACC TTAC AC ACAGTCAA TCCC AC ACC TTTCAGGG3 



TCCCCAGCAGGCAGAAGTATGCAAAGCATGCAT'" t 



BG SreCGAGGGGrCGTCCGTCTTCArACGTTTC6TACGT4fiAS 
G GKNOLUNVCQ LGCGKSPGSPAGRSMOSMHU 



Sph I 
I Ava III 
i Nsi I 

fl^^GCAACCAGGTGTGGAAAGrCCCCAGGCTCCC CAGCAGGCAGAA G TATGCAAAGCATGCATC rCAAr T A, Tra . 

~~.cc TTT cag G g G .cgagg;g. g;ccg.^ 

H L N . S * T , » p p 



: Bsfl NCO I 

t AACTCCGCCCATC CCGCCCCTAACTCCGCCCAGTTCCG CCC ATTCTCrnrrrrATrr 
AFTGAGGCGGGTaGGGCGGGGATTGAGGCGGGTCAAGGC 
L r P P I P P L T P P S S 



Bgl I 
Sfl I 



ctgactaattttttttatttatgcagaggccga ggccgcc n 

GGGfAAGAGGCGGGGTACCGACTGATTAAAAAAAATAAATACGrCTCCGGCrcCGGCGGc' " 
A H S P p H G .LIFF| YAEAE " 
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Stu I 

| A*r II pia I 

U I 

GG-"CTCT3AGCTATTCCAGAAGTA6TGAGGAG6CTTTTTT66AGGCCTAGGCTTTTGCAAA6ATCGATCAAGAGACA6GaTGA6GATCGTTTC6CATGAT 

■ ~ , , , i 1 ■ . i 1 1 1 1 1 1 1 1 ~< 1 1 ' ' i i xc." 

ccggagactcgataaggtcttcatcactcctccgaaaaaacctccggatccgaaaacgtttctagctagttctctgtcctactcctagcaaagcgtacta 

aselfqk. . g g f f g g l g f c k d r s r 0 r m r i v s h 0 

■ - i . ... ■ i ■ i 

3spM I Xma III 

! i 

TGAACAAGATGGATTGCACGCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGCC 

, , , , , 1 ■ 1 , 1 > ■ ■ i i 1 1 1 1 ■ i h y\C\ 

ACTTGTTCTACCTAACGTGCGTCCAAGAGGCCGGCGAACCCACCTCTCCGATAAGCC6ATACTGACCCGTGTTGTCTGTTAGCCGACGAGACTACGGCGG 

TRVIARRFSGRLGGEAIRL LGTTDNRLL .CP 

Nar I 

{ Bbe I Kspl 

ii ________ _ 

GTGTTCCGGCTGTCAGCGCAGGGGCGCCCGG I TC I If iTG I CAAGaCXGACL I G I CCGGl GCCL I liAA 1 GAaC I _LAAtiALbAGliLAGCliCliG_ I A ICG I 

, 1 . 1 ► 1 < 1 ' 1 ' 1 ' 1 ' 1 ' 1 ' 1- 32a 

CACAAGGCCGACAGTCGCGTCCCCGCGGGCCAAGAAAAACAGTTCTGGCTGGACAGGCCACGGGACTTACTTGACGTTCTGCTCCGTCGCGCC6ATAGCA 

RVPAVSAGAPGSFCQDRPVRCPE. T A R R G 5 A A I V 

• ... ill. , t I , ■ , . — ■ » I ■ ■ l I I nn — — i-... _J • ■ 1- 

Fspl 

Pal I Pvu II Jth I 

! ! j i 

GGCTGGCCACGACGGGCGTTCCTTGCGCAGCTGTGCTCGACGTTGTCACTGAAGCGGGAAGGGACTGGCTGCTATTGGGCGAAGTGCCGGGGCAGGATCT 

, : , 1 , 1 ... i 1 1 1 1 . 1 1 1 1 • 1 1- 

CCGACCGGTGCTGCCCGCAAGGAACGCGTCGACACGAGCTGCAACAGTGACTTCGCCCTTCCCTGACCGACGATAACCCGCTTCACGGCCCCGTCCTAGh 
AGHOGRSLRSCARRCH .SGKGLAA IGRSA GAG3 

aspM i 

i 

CCTGTCATCTCACCTTGCTCCTGCCGAGAAAGTATCCATCATGGCTGATGCAATGCGGCGGCTGCATACGCTTGATCCGGCTACCTGCCCATTCGACCAC 
GGACAGTAGAGTGGAACGAGGACGGCTCTTTCATAGGTAGTACCGACTACGTTACGCCGCCGACGTATGCGAACTAGGCCGATGGACGGGTAAGCT3GTG 
PVISPCSCRESIHHG.CNAAAAYA.SGYLPIRP 

Ear I 
Ksp632t 

j 

CAAGCGAAmCATCGCATCGAGCGAGCACGTACTCGGATGGAAGCCGGTCTTGTCGATCAGGATGATCTGGACGAAGAGCATCAGGGGCTCGCGCCASCCG 
GTTCGCTTTGTAGCGTAGCTCGCTCGTGCATGAGCCTACCTTCGGCCAGAACAGCTAGTCCTACTAGACCTGCTTCTCGTAGTCCCCGmGCGCGGTCGGC 

psetshrastysdgsrscrsg. sgrrasgarasr 

: Sph I Nco I 



aactgttcgccaggctcaaggcgagcatgcccgacggcgaggatctcgtcgtgacccatggcgatgcctgcttgccgaatarcatggtggaaaatgocc. 
ttgacaagcggtccgagttccgctcgtacgggctgccgctcctagagcagcactgggtaccgctacggacgaacggcttatagtaccaccttttaccggc 

rVROAQGEHARRRGSRRDPWRCLLAEYHG GKVP 



Ear I 
Ksp62 

I 

C "TTCTGGATTCATCGAC TGTGGCCGGCTGGGTGTGGCGGACCGCTATC AGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAk 



Nae I Bsr II Ksp632l 

1 i 1 



GAAAAGACCTAAGTAGCTGACACCGGCCGACCCACACCGCCTGGCGATAGTCCTGTATCGCAACCGATGGGCACTATAACGACTTCTCGAACCGCCoCTT 

lfwihrlwpagcggplsghsvgyp .yc . r a V p p 
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7GjGCTGACCGCTTCCTCGTGCTTTACGGTATCG CCGCTCCCGATTCGCAGCGCATCGCCTTCTATCGCCTTCTTGAC GAGTTCTrC7G^i"rf:i;(:»rTrT 
ACCCGACTGGCGAAGGAGCACGAAATGCCATAGCGGCGAGGGCTAAGCGTCGCGTAGCGGAAGATAGCGGAAGAACTGCTCAAGAAGdCTCGCCCT'" 'i'* '" 

M 5 ■ p L p R . A L R / » R S RFAAHRLLSPS 



"VLLSGTL 



Asu II 



BspM 



GGGGfTCGAAATGACCGACCAAGCGACGCCCAACCT GCC ATCACGAGATTTCGATTCC ACCGCCGCCTTCTA TGAAAGGTTGGGC TTCGC&ATrrzTTTrr 
CCCCAAGCTTTACTGGCTGGTTCGCTGCGGGTTGGACGGTAGTGCTCTAAAGC TAAG GTGGCGGCGGAAGATACTTTCCAACCCGAAGCCTTAGC * AAA" 
G f E H T 0 Q A r P M L p s R D F 0 S T A A F V E p L G F G , „ / 



33C 



Nae I 



Kspl 



Avr ll 



CGGGACGCCGGCTGGATGArCCTCCAGCGCGGGGATC TCATGCTGGAG TTCTTCGCCCACCCTAGGGGGAGGCTAACTGAAACArnnAai'i 



GCCCTGCGGCCGACCTACTAGGAGGTCGCGCCCCTAGAGTACGACCTC4 



GGAGACAATAC 

:aagaagcgggtgggatccccctccgattgactttgtgccttcctctgttat- 

RDA GVMI LQRGOLMLEFFAHPRG R L T E T R K E T . ' 



+■ toe 



|<spl 

CGGAA6GAACCCGCGCTATGACGGCAATAAAAAGACAG AATAAAACGCACGGTGTTGGGTCGTTTGTTCATAAACGC GGGGTTCGGTCCCAGGGCTfiftr^ 

GCCrTCCTTGGGCGCGATACTGCCGTTArTTTTCTGTCTTATTTTGCGTGCCACAACCC ^CAAACAAGlATTrGCGCCCCAAGCCAGGGrcCCGACCG; 
' E . G T " A M T A ' K R ° NK T " ^ V ° ,S F V H K R G y R S Q g w H 

CTCr G TCGATACCCCACCGAGACCCCArTGGGGCCAATAC GCCCGCGTTTCTTCCTTTTCCCCACCCCACCCCCCAAGT T CGGGTGAAGGCCrAG G ,r,- 

GAGACAGCTATGGGGTGGCrCTGGGGTAACCCCGGTTATGCGGGCGCAAAGAAGGAAAAGGGGTGGGGTGGGGGGTTCAAGCCCACTTCCGGGrCCCGAG 
' V 0 T P P " " " W G ° y . A R ^ S F S P P H P P S S G E G p G , 

f' Wnl P XaNI Ora. pral 
GCAGCCAACGTCG G GGCGGCAGGCCCTGCCATAGCCTCAGG TrACTCATArATACTTTAGArTGAT TT kAAACTTrA TTTT r a . TTT L.^ 



-iC": 



u. ' ^GTTGCAGCCCCGCCGTCCGGGACGGTATCGGAGTCCAATGAGTAlATATGAAATCTAA CTAAATlrTGAAGTAAAAATTAAATTTrCCTAG^T.-- 1 
AAN^VGAAGPA I A S G Y 5 Y I L IDLKLHF.FHR| "" 



BspH I 

rGAA 5 A T CCTTT T TGArAArCTCATGACCAA fl ATCCCT T AAC G TGAGrTTTCGTTCCACrGAGCGTCAGACCCCG T A,^ aA , ,-, 

^. CTAGGAA Jfl ACTATTAGAGTACTGGTTTTAGGGAATTGCACTCAAAAGCAAGGTG; CTCGCAGTCTGGGGCATCT:TTCTAGTTTrcr A GAA ,A^ " 

IP REFSFH . ASOPVEn; I K G S S ^ 

^rtu^^^^ ^^^^^^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^GCCGGATCAAGA GCTACCAACrCTTTTT 

— ■ *-^^^VCLPDQElpT[_|t 
Bsrl 

CC "^^GTCTCGCGrCTATCG-TTATGACAGGAAGArCACATCGGCArCAATCCGG rGGrGAAGTTCrrGAGACATCGrGGCGG,- ^ 

— - * PNTVit-V . P . LG H H F K N s V A P P 
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Alwn I 

i 
i 

catacctcgctctgctaatcctgttaccagtggctgctgccagtggcgataagtcgtgtcttaccgggttggactcaagacgatagttaccggataagg: 
gtatggagcgagacgattaggacaatggtcaccgacgacggtcaccgctattcagcacagaatggcccaacctgagttctgctatcaatg5cctattcc3 
tylall illpvaaasgdkscltgldsrr lpok.c 

ApaL I 

GCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGC 
, , , 1 , 1 , 1 , 1 , 1 , 1 1 1 1 1 h aaoc 

CGTCGCC AGCCCGAC TTGCCCCCCAAGCACGTGTGTCGGGTCGAACCTCGCTTGCTGGATGTGGCTTGACTCTATGGATGTCGCACTCGATACTCTTTCG 
Q R S G TGGSCTQPSL.ERTTYTELR YLQREL . E 3 

GCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGT 
, , , i ~ 1 i . . . 1 1 , , , : , h tsc*: 

CGGTGCGAAGGGCTTCCCTCTTTCCGCCTGTCCATAGGCCATTCGCCGTCCCAGCCTTGTCCTCTCGCGTGCTCCCTCGAAGGTCCCCCTTTGCGGACCA 
A TLPEGRKADR YPVSGRVGTGERTRELPGGNAW 

ATCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACG: 
1 1 . (— h 1 . 1 1 1 . 1 , 1 , 1 ■ ■ ■ . 1 ** 50CC 

TAGAAATATCAGGACAGCCCAAAGCGGTGGAGACTGAACTCGCAGCTAAAAACACTACGAGCAGTCCCCCCGCCTCGGATACCTTTTTGCGGTCGTTGCG 
YL YSPVGFRHU. LERRFL . CSSGGRSLVKNASNA 

Ava III 
Nsi I 

GGCCfTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCC ATGCAT 

1 ' ' ' ' — — — i i 1 i — ■ — ^ 1 ■ i ■ • 1 t i ii 

CCGGAAAAATGCCAAGGACCGGAAAACGACCGGAAAACGAGTGTAC AAGAAAGGACGCAATAGGGGACTAAGACACCTATTGGCATAATGGCGG TACGTA 

AFLR FLAF CWPFAHMFFPALSPO SVONR ITAMH 
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At 



L 



TAG 



TTATTAATAGTAATCAATTACGGGGTCATTAGTTCA TAGCCCATATA ^QGAGTTCCGCGTTACATAAC TT ACGGTAAATGGCCCGCCT^i'ir Tn*rr-. 
AA TA ATTATC A TT AfiT TiATrrrrr apta a t*~ . i» 1 ' _ _ ' 1 — ' ' ' 1 I 1 j , ~ *"* 



ArCAArAATTATCATTAGTTAATGCCCCAGTAATC.AGTATC G GGT ATATACCTCAAGGCGCA,TGTAT; GA ArGCCAT;TACCGGGCGGACCGACTr.- '» 
L L .' V ' N Y 6 V ' 5 S ■ " ' ^ V P R y , T Y G K W p a vT I"' 



Aat II 

j Aat II 

CCCAACGACCCCCGCCC^^ _ 

G GG TTGC TGGGGGC GGG TAAC TGCAG TTAT TAC TGCA TACAAGGG TATCA T TGCGGTT A TCCC TGAAAGGTAAC TGCAG T TACCC ACC TC AT AAATG r '" A *» 
AQ Rppp t OVNMQ V C S H S N A N R D F 



Ide 1 



Aat tl 

i 



PLTSMGGVFTV 



Bgl I 



AAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCCAAGTACGCCCCCTATTn 



■ 5ACG TCAATGACGGTA AATGGCCCGCCTGGCATTATGCrrAflTa 

TTGACGGGTGAACCGTCATGTAGTTCACATAGTATACGGTTCATGCGGGGGATAACTGCA GTTACTGC^ATTTACCGGGCGGACCGTAATACGGGTCAT 300 
-^1_^_L_U_LA S ySYAKYAPY. RQ.R . MARLALCPV 



SnaB 



Nco t 



C ATGACC TTATGGGAC 



TTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATnrnr: 



GTACTGGAATACCCTGAAAGGATGAACCGTCATGTAGATGCATAATCAGTAGCGATAATGGTACCAC 



TTTTGGCAGTACATCAATGGGCGTGGA 



HOLNGLSY 



LA VHLR ISHRY 



TACGCC AAAACCGTCATGTAGTTACCCGC ACCT 



AGCGGTTTGACTCACGGGGATTTCCAAGTCTCC 



/Vat If 

ACCCCATTGACGTCAATGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTrra,A a Arr:Trr:T.-. 



ATCGCCAAACTGAGTGCCCCTAAAGGTTCAGAGG TGGGGTAACTGCAGTTACCCTCAAACAAAACCGTGGTTTTAGTTGCCCTGAAAGGTTTTAf^ 

Q . V E F V L A P K S T G I S K M S 



AV -LTGISKSPP H .R 



Nhe I £c47 

: Z"." CCATTGACGCAAATGGGC ° GTAGGCGTG " CGGT G GGAGGTCTATAT ^ 

TuTTGAGoCGGGGrAACTGCGTTTACCCGCCATCCGCACATGCCACCCTCCAGATATATTCGTCTCG ACCAAATCACTTGGCAGTCTAGGCGATCGCGAT *» 

A C T V G G L Y K OSWFSEPSOPLAL 



0 L R P IDANGR 



NCO I 



rTz AccATGGrGAGcA ^ - 

G.^CAGCGGTGGTACCACTCGTTCCCGCTCCTCGACAAGTGGrrrr. 



:ACCACGGGTAGGACCAGCTCGACCTGCCGCTGCATTTGCCGGTGTTrAAr,Tr,"- 



P V A T M V S 



KGEELFTGV 



y P ' L V E L OGOVNGHKFS 



.^■w« M . M . w « nM . M1 Mm|alMc ^^ 



-eGFP.C.e.unc53 
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fig 3Q pEGFP72 (1 > 9697) Site and Sequence 

C ACCC TG ACCTACGGCGTGCAGTGCTTCAGCCGC TACCCCGACCAC ATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCT^CGTCCAGGA^ 

i i 1 i i t i ■ i i t — » t i i i i 1 -zjr, 

GTGGGACTGGATGCCGCACGTCACGAAGTCGGCGATGGGGCTGGTGTACTTCGTCGTGCTGAAGAAGTTCAGGCGGTACGGGCTTCCGAT3CAGGTCCT: 



-eGFP.C.e.unc53 



TLTYGVQCFSRYPDHMKOHOFFKSAMPEGYVQE 

t ■ - ... . . ..... '.ill 1 l 1 ■ • , 

Kspl 

i 

CGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCG 

, . , i , 1 1 1 1 ' 1 1 1 1 ■ ' ■ 1 1 i ♦ — — i- toe-; 

GCGTGGTAGAAGAAGTTCCTGCTGCCGTTGATGTTCTGGGCGCGGCTCCACTTCAAGCTCCCGCTGTGGGACCACTTGGCGTAGCTCGACTTCCCGTAGC 



-eGFP.C.e.unc53 



RT IFFKDDGNYKTRAE VKFEGOTLVNR IELKG I 



TGAAGTTCCTCCTGCCGTTGTAGGACCCCGTGTTCGACCTCATGTTGATGTTGTCGGTGTTGCAGATATAGTACCGGCTGTTCGTCTTCTTGCCGTAGTT 



-eGFP.C.e.unc53 



DFKEOGNILGHKLEYNYNSHNVY I MADKQKNG I K 
GGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTG 

, 1 . 1 , 1 . ^ < 1 « 1 1 1 ■ ■ * 1 . \ 1 k i2o»; 

CCACrTGAAGTTCTAGGCGGTGTTGTAGCTCCTGCCGTCGCACGTCGAGCGGCTGGTGATGGTCGTCTTGTGGGGGTAGCCGCTGCCGGGGCACGACGAC 



-eGFP.C.e.unc53 



VNFK IRHN IEOGSVQLAOHYQ0NTP IGDGPVLL 

----- . - - --^ - - - - 1 L - - « ■ . . ■ . . ' . „ 

CCCGACAACCAC TACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACC3CCGCC6GGA 

» ■ I 1 ■ 1 1 I ' 1 I ' 1 1 ' I : 1 ■ ~< 1 > . ■ 1 K 

GGGCTGTTGGTGATGGACTCGTGGGTCAGGCGGGACTCGTTTCTGGGGTTGCTCTTCGCGCTAGTGTACCAGGACGACCTCAAGCACTGGCGGCGGCCCT 



-eGFP.C.e.unc53 



PDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAG 

Asu It 

0spM M 3gl II EcoN I 

1 i 1 

TCACTCTCGGCATGGACGAGCTGTACAAGTCCGGACTCAGATCTACGTCAAATGTAGAATTGATACCAATCTACACGGATTGGGCCAATCGGCACCrTTC 

AGTGAGAGCCGTACCTGCTCGACATGTTCAGGCC TGAGTCTAGATGCAGTTTACATCTTAAC TATGGTTAGATGTGCCTAACCCGGTTaGCCGTGGAAAo 



-eGFP.C.e.unc53 



-C.e. unc53 



I TLGMOEL YKSGLRSTSNVEL [P I YTOVANRHL3 

' ■ ■ ■ ' ■ . ■ ,1 . - — i , 1 i ■ . , . 

Nru I EcoR I 

} j 

GAAGGGCAGCTTATCAAAGTCGATTAGGGATATTTCCAATGATTTTCGCGACTATCGACTGGTTTCTCAGCTTATTAATGTGATCGTTCCGATCAACGAm 

1 ' ' ' 1 'II I I , I , I I , — i 1 , 1 I 1- h ■ " > 

cttcccgtcgaatagtttcagctaatccctataaaggttactaaaagcgctgatagctgaccaaagagtcgaataattacactagcaagg-tagttgctt 



-eGFP.C.e.unc53 



■C.e. unc53 



K G S L 5 K S I .ft D . t 3 M D F R 0 Y R L V S Q L 1 M V 1 V - t N E 
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Bsm I 

I 

TTCTCGCCTGCATTCACGAAACGTTTGGCAAAAATCACA 




FSPAFTKRLAK 



HcoR V 



I t snlogle tcldy l k n l g l 



0 c 



Pvu I! 

I 

i 



Ear I 
pp632l 



Hind (If 

i 



CGAAACTCACCAAAACCGATATCGACAG^ 



GCTTTGAGTGGTTTTGGCTATAGCTGTCGCCTTTGAACCCACGTCAAGAfin 




SKLTKTOIDSGN 



-C.e. unc53 



LGAVLQLLFLLST 



YKQKLRQLK 




KOQKKLEQLP 



TS 1MPPAVSK 



LPSPRVATSAT 



GCAAC 



TAACCCAAATTCCAACTTTCCAC AAATGTCAACATCC AGGCTTCAGACTCCAC AGTCAAGAATATCGAAAATTGATTCATTAAAftATTrriT »tv- a 



CGTTGATTGGGTT TAAGGTTGAAAGGTGTTTACAG TTGTAGG TCCGAAGTCTGAGGT GTCAGT 

jeGFROaunciT" 




A TNPNSNFPQ M 



5 T SRL Q T PQ S R I S K I 0 S S K I G 



G 1 



Aat II 



AGCCAAAGACGT^GGACTTAAA^ 



TCGGTTTCTGCAGACCTGAATTTGGTGGGAGTAGTAGTTGGTGAAr,TAr.lr 



TAA 



TATTATGTTTAAGTAAGGCAGGCAGCTCGGCAAGCTCACCGTTAT TATT 




KPKTSGLKP 



PSSSTTSSNNTN 



SFRPssRSSGNNN 



BNSDOCID: <WO 982481 0A2_I_> 



• # 

WO 98/24810 259/270 PCT/EP97/06956 
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tig 30 pEGFP72 (1 >9697) Site and Sequence ____ 

Ear I 

EcoR V Ksp632l Asu II 

i i 5 

TGTTGGCTCGACGATATCCACATCTGCGAAGAGCTTAGAATCATCATCAACGTACAGCTCTATTTCGAATCTAAACCGACCTACCTCCCAACTCCAAAAA 

i i -« 1 1 i . i ' ' i ■ t ' i 1 1 — 1 1 *- h 2 I C* 

ACAACCGAGCTGCTATAGGTGTAGACGCTTCTCGAATCTTAGTAGTAGTTGCATGTCGAGATAAAGCTTAGATTTGGCTGGATGGAGGGTTGAGGTTTTT 



-eGFP.C.e.uncS3 



-C.e. unc53 



V G 



S T 1STSAK SLESSSTYSS ! SNLNRPTSGLQK 



Xba I t*** 1 

\ * 
CCTTCTAGACCA CAAACCC AGCTAGTTC6TGTTGCTACAACTACAAAAATCGGAAGCTCAAAGCTAGCCGC TCCGAAAGCCGTGAGCACCCC AAAACTTG 

GGAAGATCTGGTGTTTGGGTCGATCAAGCACAACGATGTTGATGTTTTTAGCCTTCGAGTTTCGATCGGCGAGGCTTTCGGCACTCGTGGGGTTTTGAAC 



-eGFP.C.e.unc53 



-C.e. unc53 



P S 



RPOTQLVRVATTTK IGSSKLAAPKAVSTPKL 



Bsm I 

1 

CTTCTGTGAAGACTATTGGAGCAAAACAAGAGCCCGATAACAGCGGTGGTGGTGGTGGTGGAATGCTGAAATTAAAGTTATTCAGTAGCAAAAACCCATC 

. . . ■ , , iii iii i i i t ■ i ( . i i ii i i i - - - ■ * i ^ ;3 vj'. 

GAAGACACTTCTGATAACCTCGTTTTGTTCTCGGGCTATTGTCGCCACCACCACCACCACCTTACGACTTTAATTTCAATAAGTCATCGTTTTTGGGTAG 



-eGFP.C.e unc53 



■C.e. unc53 



ASVKTIGAKQEPDNSGGGGGGMLKLKLFSSKMP i; 

■ ■ i ... i * 1 ■ 1 ■ ■ i i ■ ■ ■ — ■ * ■ ■ ' ' 1 

TTCCTCATCGAATAGCCCACAACCTACGAGAAAGGCGGCGGCGGTGCCTCAACAACAAACTTTGTCGAAAATCGCTGCCCCAGTGAAAAGTGGCCTGAAG 

i , | 1 | . -4 • 1 i i 1 1 ' 1 ■ ' ' H 

AAGGAGTAGCTTATCGGGTGTTGGATGCTCTTTCCGCCGCCGCCACGGAGTTGTTGTTTGAAACAGCTTTTAGCGACGGGGTCACTTTTCACCGGACTTC 



•eGFP.C.e.uncS3 



-C.e. unc53 



5SSNSPQPTRKAAAVPGQQTLSK IA A P V K S G L Y 

; BstX I Hind III 

CCGCCGACCAGTAAG CTGGGAAG TGCCACGTCTATGTCGAAGCTTTGTACGCCAAAAGTTTCCTACCGTAAAACGGACGCCCC AATCATATCTC AACAAG 
GGCGGCTGGTCATTCGACCCTTCACGGTGCAGATACAGCTTCGAAACATGCGGTTTTCAAAGGATGGCATTTTGCCTGCGGGGTTAGTATAGAGTTGTTC 



-eGFP.C.e.unc53 



-C.e. uncS3 



D?T3KLGSATSMSKLCTPKVSYRKT0A?I t 3 Q Q 



BNSDOCID: <WO 982481 0A2_I_> 



• 



WO 98/24810 



260/270 



PCTYEP97/06956 
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tig30pEGFF72 (1 >9697) Site and Sequence 



Page i 



Ear I 

Ksp632l J3spM II 

! ! 

ACTCGAAACGATGC ^ A ^^ A ^G^ A GAGTCCGGATACGCTGGATTCAACAGCACGTCGCCAACGTCATCATCGACGGA AGGTTCCCTAAGCAT 
TGAGCTT TGC TACGAGTTTCTCGTCACTTCTTCTCAGGCCTATGCGACCTAAGTTGTCGTGCAGCGGTTGCAG TAG TAGCTGCCTTCCAAGGGATTrnjl 

-eGFP.C.e.unc53 




OSKRCSKSSEEESG 



YAGFNSTSPTSSST 



E G S L S M 



Bsm I 

Sph I 
Ava lit 
Nsi I 



GCATTCCACATC 



TTCCAAGAGTTCAACGTCAGACGAAAAGTCTCCG TC ATCAGACGATCTTACTCTTAACGCC 



TCCATCGTGACAGCTATCAGACAGCCG 



CGTAAGGTGTAGAAG GTTCTCAAGTTGCAGTCTGCTTTTCAGAGGCAGTAGTCTGCTAGAATGAGAATTGCGGAGG TAGCAC^ 27 

-eGFP.C.e.unc53 



' "~ C.e. unc53 • 

HSTSSKSSTSOEKSPSSO 



OLTLNASIVTA1 



R Q P 



pp I 

XTAGCCGCAACACCGGTTTCTCCAAATA7TATC 



AACAAGCCTGTTGAGGAAAAACCAACACTGGCAGTGAAAGGAGTGAAAAGCACAGCGAA 



AAAAGATC 



TATCGGCGTTGTGGC CAAAGAGGTTTATAATAGTTGTTCGGACAACTCCTTTTTGGTTGTGACCGTCACTT TCCTCACTTTTCGTGTCGrTTTTTTrTA:: ^ 

-eGFP.C.e.unc53 



-C.e. uncS3 



A K K 0 



1 A A T P V S P N ' " < * V E E K P T L A V K G V K S T 

PmaCI 
Pvu II PmaCI 

! j I EcoR V 

C A C CTCCAGCTGT TCCGCCACGTGAC AC CCAGCC A AC AA TCGGAGTTG TTAGTCCAATTATGGCAC ATAAGAAGTTGACAAATGA CCCCGTGATATCTGA 

gtggaggtcgacaaggcggtgcactgtgggtcggttgtt^cctcaacaatcaggttaaIaccgtgtatIcttcaactgIttacIggggcacta^ 



PPPAVPPROT 



-C.e. unc53 



QPTIGVVSP1M 



AHKKLTMDPV I SE 



Alwn I 



aaaaccagaacctga aaagctccaatcaatgagcatcgacacgacggacgttccaccgcttccacc tctaaaatcagttgttccact^^ 

TTTTGGTCTTGGACTTTTCGAGGTTAGTTACTCGTAGCTGTGCTGCCTGCAAGGTGGCGAAGnTr.n 



TTCA 



agattttagtcaacaaggtgaattttactgaag: 




-C.e. unc53 



KP£ | p EK L0S MS IOTTOVPPLPPLK 



SVVPLKMT3 



BNSDOCID: <WO 9824810A2_L> 
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SpU 

atccgacaa ccaccaaVgtacgatgttcttctaaaacaa 

taggctgttggtggttgcatgctacaagaagattttgttcctttttagtgtagcggacagttcagcaaacctatactcgtcagcaggcgcagacttctga 



-eGFP.C.e.unc53 



-C.e. unc53 



I RQPPTYOVLLKQGK I TSPVKSFGYEQSSASEO 

CCATTGTGGCTCATGC6TCGGCTCAGGTGACTCCGCCGACAAAAACTTCTG6TAATC ATTCGCTGGAGAGAAGGATGGGAAAGAATAAGAC ATCAGAATC 

, , , , — _4 — „ — i , 1 ■ 1 1 ' ' i ' 1 i 1 1 « 1 ■ i 32CC 

GGTAACACCGAGTACGCAGCCGAGTCCACTGAGGCGGCTGTTTTTGAAGACCATTAGTAAGCGACCTCTCTTCCTACCCTTTCTTATTCTGTAGTCTTAG 



-eGFP.C.e.unc53 



— Co. unc53 

SIVAHASAQVTPPTKTSGNHSLERRMGKNKTSE3 
■ * — i... -.. I * ■ — . — ..*< ■ ■ ■ * — — - « ' ■ ■ — .«-—■ ^ - 

CAGCGGCTACACCTCTGACGCCGGTGTTGCGATGTGCGCCAAAATGAGGGAGAAGCTGAAAGAATACGATGACATGACTCGTCGAGCACAGAACGGCTAT 

i ■ 1 1 i i i \ .i i > 1 1 1 ■ ' i ■ 1 • " ■ »- 300-; 

GTCGCCGATGTGGAGACTGCGGCCACAACGCTACACGCGGTTTTACTCCCTCTTCGACTTTCTTATGCTACTGTACTGAGCAGCTCGTGTCTTGCCGATA 



-eGFP.C.e.unc53 



-C.e. unc53 



SGYTSDAGVAMCAKMREKLKEYDDMTRRAONGY 

A.su II : Sst I pspM tl 

CCTGACAACTTCGAAGACAGTTCCTCCTTGTCGTCTGGAATATCCGATAACAACGAGCTCGACGACATATCCACGGACGATTTGTCCGGA ^ 
GGACT6TTGAAGCTTCTGTCAAGGAGGAACAGCAGACCTTATAGGCTATTGTTGCTCGAGCTGCTGTATAGGTGCCTGCTAAACAGGCCTCATCTG TA'-C 



-eGFP.C.e.unc53 



-C.e. unc53 



PONFEOSSSLSSG ISDNNELDD I STDDLSGVOM 

. , , - ■ . I,, ■ - ■ - ■ ■ ' - 1 ■ ■ - - 

C AACAGTCGCCTCCAAACATAGCGACTATTCCC ACTTTGTTCGCCATCCC ACGTCTTCTTCCTCAAAGCCCCGAGTCCCCAGTCGGTCCTCC AC ATCAG* 
G rTGTCAGCGGAGGTTTGTATCGCTGATAAGGGTGAAAC AAGCGGTAGGGTGCAGAAGAAGGAGTTTCGGGGC TCAGGGGTC AGCCAGGAGG'GTAGTCA 



•eGFP.C.e.unc53 



C.e. unc53 — ' 

ATVASKHSDYSHFVRHPTSSSSKPR VPSRSSTSV 



BNSDOCID: <W0 9824810A2_I_> 



WO 98/24810 



262/270 



PCTYEP97/06956 



•Tuesday. 18 November 1997 10:34 

fig 30 p£GFP72 (1 > 9697) Site and Sequence 



Page 7 



<ho I 



591 I 

j Nar I 



0be 



CGATTCTCGATCTCGAGCAGAACAGGAGAATGTGTAC AAACTTCTGTCCCAGTGCCGAACGAGCCAACGTGGCGCCGCTGC CACC TCAACCTTCGGACAA 
GCTAAGAGCTAGAGCTCGTCTTGTCCTCTTACACATGTTTGAAGACAGGGTCACGGCTTGCTCGGTTGCACCGCG GCGACGGTGGAGTTGG ^ 

-eGFP.C.e.ur.c53 




-C.e. unc53 



° 5 R . S R A , E Q . E N V Y * L L 5 Q CRTS0RGAAATSTF6Q 



Xma I 
1 Sma I 

: : 
f i 



Pvu II 



Sal t 



CATTCGCTAAGATCCCCGGGATACTCATCCTATTCTCCACACTTATCAGTGTCAGCTGATAAGGACACAATGTCTATGCACTCACAGACTAGTCGACGA; 



GTAAGCGATTCTAGGGGCCCTATGAGTAGGATAAGAGGTGTGAATAGTCACAGTCGACTATTCCTGT GTTACAGATACGTGAGTGTCTGATCAGCTGCT^ 

eGFP.C.e.unc53 



370*; 



~" " —C.e. unnga — 

H S L R S P G Y S S . Y S P H L S V S A D K Q T M S H H S Q T S R R 

CTTCTTCACAAAAACCAAGCTATTCAGGCCAATTTCATTCACTTGATCGTAAATGCCACCTTCAAGAGTTCACATCCACCGAGCACAG 
GAAGAAGTGTTTTTGGTTCGATAAGTCCGGTTAAAGTAAGTGAACTAGCATTTACGGTGGAAGTTCTCAAGTGTAGGTGGCTCGTGTCT 




" —C.e. unc53 . 

p S 5 Q K P S Y S G Q F H SLDRKCHLOEFTSTEHRM 



A A L 



Bam HI 



CTTGAGCCCGAGAC GGGTGCCGAACTCGATGTCGAAATATGATTCTTC AGGATCCTACTCGGCGCGTTCCCGAGGTGGAAGCTCTACTGGTATCTATGG A 
GAACTCGGGCTCTGCCCACGGCTTGAGC TACAGCTTTAT ACTAAGAAG ^CCTAGGATGAGCCGCGCAAGGGCTCCACCTTCGAGATGACCATAGATAr r" 



-eGFP.C.e.unc53 



-C.e. unc53 



L S P . R R V . P " S " 5 K Y D S S G S Y SARSRGGSST 



G I Y G 



p«n HI flhe I JsJde I 

GAGACGTTCCAAC TGCACAGACTATCCGATGAAAAATCCCCCGCAC ATTC TGCCAAAAGTGAGATGGGATCCC AAC TATCAC TGoCTAGCACGACAGCA^~ 
CTCTGCAAGGTTGACGTGTCTGATAGGCT ACTTTTTAGGGGGCGTGTAAGACGGTT TTCACTCTACCCTAGGG TTGATAGTGACCGATCGTGCTGTCGtI 




- Tp OLHRLSDEKS 



PAHSAKSEMGSQ 



L S L A S T T A 



BNSDOCID: <WO_9824810A2_L> 



• * 

WO 98/24810 263/270 PCT/EP97/06956 



-Tuesday. 18 November 1997 10:34 PaQ e * 
fig 30 pEGFP72 (1 > 9697) Site and Sequence 

PmaCI 

j PmaCI Sal I 

i I i 

ATGGATCTCTCAATGAGAAGTACGAACATGCTATTCGGGACATGGCACGTGACTTGGAGTGTTACAAGAACACTGTCGACTCACTAACCAAGAAACAGGA 

, i — 1 ■ ■ ■ . t 1 I « 1 ' ■ 111 ' > ' 1 h 

TACCTAGAGAGTTACTCTTCATGCTTGTACGATAAGCCCTGTACCGTGCACTGAACCTCACAATGTTCTTG TGACAGCTGAGTGATTGGTTCTTTG TCC" 



■eGFP.C.e.unc53 



-C.e. unc53 



GSLNEKYEHAIROMAROLECYKNTVDSLTKKQE 



Hind III Cla I Ksp632l 

1 i i 



Ear 
KspC 

i 

GAACTATGGAGCATTGTTTGATCTTTTTGAGCAAAAGCTTAGAAAACTCACTCAA CACATTGATCGATCCAACTTGAAGCCTGAAGAGGCAATACGATT: 

XJXTC GAATC TTTTGAGTG AGTTGTGTAAC TAGC7AGGTTGAACTTCGGACT7CTCCGTTATGC TAi2 



-eGFP.C.e.unc53 



-C.e. unc53 



NYGALFDLFEQKLRKLTQHIORSNLKPEEAIRF 

AGGCAGGACATTGCTCATTTGAGGGATATTAGCAATCATCTTGCATCCAACTCAGCTCATGCTAACGAAGGCGCTGGTGAGCTTCTTCGTCAACCATCTC 

i i | i | i I 1 i i i | i . i . I i 1 1 1 t 1 i i l-jOI 

TCCGTCCTGTAACGAGTAAACTCCCTATAATCGTTAG TAGAACGTAGGTTGAGTCGAGTACGATTGCTTCCGCGACCACTCGAAGAAGCAGTTGGTAGAC- 



-eGFP.C.e.unc53 



-C.e. unc53 



PQO I AHLR0ISNHLASNSAHANEGAGELLRQP3 

Ear t 

Cla I Cla I .Sst I Ksp632l 

TGGAATC AGTTGCATCCCATCGATCATCGATGTCATCGTCGTCGAAAAGCAGCAAGCAGGAGAAGATCAGCTTGAGCTCGTTTGGCAAGAACAAGAAGAG 
1 i 1 1 i 1 1 — i 1 > i ' 1 ■ ' i 1 i ^ h 

accttagtcaacgtagggtagctagtagctacagtagcagcagcttttcgtcgttcgtcctcttctagtcgaactcgagcaaaccgttcttgttcttct: 



-eGFP.C,e.unc53 



-C.e. unc53 



LESVASHRSSMSSSSKSSK0EKISLSSFGKNKK3 
0am HI Nde I BspM II 

: : : 

; i : 

ctggatccgctcctcactctccaagttcaccaagaagaagaacaagaactacgacgaagcacatatgccatcaatttccggatctcaaggaactct tgac 

gacctaggcgaggagtgagaggttcaag TGGTTCTTCTTCTTGTTCTTGATGCTGCTTCGTGTATACGGTAGTTAAAGGCCTAGAGTTCCTTGAGAACT'J 



-eGFP.C.e.unc53 



-C.e. unc53 



I RSSL SKFTKKKNKNYDEAHMPS I 3 G S 0 S T L D 



BNSDOCID: <W0 982481 0A2J_> 



WO 98/24810 264/270 PCT/EP97/06956 



Tuesday. 18 November 1997 10:34 p age ^ 
fig 30 p£GFP72 (1 > 9697) Site and Sequence 

Sst I ApaL I 

l ! 

AACATTGATGTGATTGAGT TGAAGC AAGAGCTCAAAG AACGCGATAGTGCACTTrACGAAGTCCGCCTTGACAATCTGGATCG TGCCCGCGAAG TTGATS 
TTGTAAC TACACTAACTCAACTTCGTTCTCGAGTTTCTTGCGCTATCACGTGAAATGCTTCAGGCGGAACTGTTAGACCTAGCACGGGCGCTTC AACTAC "^^ 



-eGFP.C.e.unc53 



-C.e. unc53 



N I DV I ELK QELKERDSALYE VRLONLDRAREVD 

* ■ ■ 1 — — 1 ■ ■ ■ * i 1 1 , i - ■ i . , j 

TTCTGAGGGAGACAGTGAACAAGTTGAAAACCGAGAACAAGCAATTAAAGAAAGAAGTGGACAAACTCACCAACGGTCCAGCCACTCGTGCTTCTTCr.-G 
1 1 1 I ' ' ■ I 1 ' 1 > ■ i 1 1 1 1 ■ h 

AAGACTCCCTCTGTCACTTGTTCAACTTTTGGCTCTTGTTCGrTAATTTCTTTCTTCACCTGTTTGAGTGGTTGCCAGGTCGGTGAGCACSAAGAAGGG: 



-eGFP.C,e.unc53 



-C.e. unc53 



v <- R E T V N K L K T E N K Q L K K E V OKLTNGPATRASSP 



CGCC TCAATTCCAGTTATC TACGACGATGAGCATGTC TATGATGCAGCGTGTAGCAGTACATCAGCTA GTCAATCTTCGAAACGATCCTCTGGC tgcaac 

gcggagttaaggtcaatagatgctgctactcgtacagatactacgtcgcacatcgtcatgtagtcgatcagttagaagctttgctaggagaccgacgtt; wc "' 



-eGFP.C.e unc53 



-Co. unc53 



ASIPVIYDOEHVYDAACSSTSASQSSKRSSG 



C N 



Pvu I 

! Hpa I EcoR V 

i i \ 
tcaatcaaggttactgtaaacgtggacatcgctggagaaatcagttcgatcgttaacccggaca aagagataatcgtaggatatcttgccatgtcaacch 

AGTTAGTTCC AATGACATTTGCACCTGTAGCGACCTCTTTAGfCAAGC TAGCAATTGGGCCTGTTTCTC TATT AGCATCCTATAGAACGGTaCAGTTGG" 



-eGFP.C.e.unc53 



-Co. unc53 



Y L A K S T 



S 1 K V TVMV D I A GE ISS tVNPDKE 1 IVG 

Cla I 

G-CAGTCATGCTGGAAAGACATTGATGTTTCTATTCrAGGACTATTTGAAGTCTACCTATCCAGAATTGATGTGGAGCATCAACTTGGAATCGATGCTC: 



CAGTCAGTACGACCTTTCTGTAACTACAAAGATAAGATCCTGATAAACTTCAGATGGATAGGTCTTAACTACACCTCGTAGTTGAACCTTAgcrACGAGC 



-eGFP.C.e.unc53 



" : C.e. unc53 — — ■ — — 

5 Q S C VK £>IOVSlLGLFEVYLSRIDVEHQLGIOAP 



BNSDOCID: <W0 98248 10A2_I_> 



• m 



WO 98/24810 265/270 PCT/EP97/06956 



Tuesday, 16 November 1997 10:35 p aqe } o 

fig 30 pEGFP72 (1 >9697) Site and Sequence ____ 

Mlu I 

i 

TGATTCTATCCTTGGCrATCAAATTGGTGAACTTCGACGCGTCATTGGAGACTCCACAACCATGATAACCAGCCATCCAACrGACATTCTTACrTCCTr'A 

' 1 1 » 1 1 1 » ' 1 ' H - i I ■ i 1 1+ ^1 V 

ACTAAGATAGGAACCGATAGTTTAACCACTTGAAGCTGCGCAGTAACCTCTGAGGTGTTGGTACTATTGGTCGGTAGGTTGACTGTAAGAATGAAGGAG- 



-eGFP.C e.unc53 



-C.e. unc53 



P S I L G Y Q I G E L R R V I G D S T T MITSHPTDILTS3 

ACTACAATCCGAATGTTCATGCACGGTGCCGCACAGAGTCGCGTAG ACAGTCTGGTCCTTGATATGCTTCTTCCAAAGCAAATGATTCTCCAACTCGTCA 
TGATGTTAGGCTTACAAGTACGTGCCACGGCGTGTCTCAGCGCATCTGTCAGACCAGGAACTATACGAAGAAGGTTTCGTTTACTAAGAGGTTGAGCAG" 



-eGFP.C.e.unc53 



-C.e. unc53 



T T { R M F M H G A A Q S R V 0 S L V L D M L L P KOMILQLV 

j*at II psrl psrl ^ su tt 

AGTCAATTTTGACAGAGAGACGTCTGGTGTTAGCTGGAGCAACTGGAATTGGAAAGAGCAAAC TGGCGAAGACCCTGGCTGCTTATGTATCTATTCGAAZ 
TCAGTTAAAACTGTCTCTC TGCAGACCACAATCGACCTCGTTGACCTTAACCTTTCTCGTTTGACCGCTTC TGGGACCGACGAATACATAGATAAGCTTt: ^ 



- eGFP.C. e.unc53 



-C.e. unc53 



K S ILTERRLVLAGATG IGKSKLAKT 



LAAYVS [ R T 



aaatcaatccgaagatagtattgttaatatcag'cattcc TGAAAACAA TAAAGAAGAATTGCTTC AAGTGGAACGACGCCTGGAAAAGATCTTGAGAAGC 
TTTAGTTAGGCTTC TATCATAACAAT7ATAGTCGTAAGGACTTTTGTTATTTCTTCTTAACGAAGTTCACC TTGCTGCGGACCTTTTCTAGAAC TCTTCO 



- eGFP.C. e.unc53 



-C.e. unc53 



N Q S £ 0 S I V N I S I P E N N K E E L L QVERRLEK I LP -3 
Ava III 

Nsi I pa | 

aaagaatcatgcatcgtaat-tctagataatatcccaaagaatcgaattgcatttgtt gtatccgtttttgcaaatgtcccacttcaaaacaacgaaggt: 

TTTCTTAGTACGTAGCATTAAGATCTArTATAGGGTTTCTTAGCTTAACGTAAACAACATAGGCAAAAACGTTTACAGGGTGAAGTTTTGTTGCTTCCAG 



— C.e, uncS3 



K £ S , C { V I L 0 N I P K N R I AFVVSVFANVPLONNEG 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 



266/270 



• 

PCT/EP97/06956 



Tuesday. 18 November 1997 10:35 

fig 30 pEGFP72 (1 >9697) Site and Sequence 



Page II 



EcoR V 

i 

CATTTGTAGTATCCACACTCAACCGATATCAAATCCCTC^ 

gtaaacatcatacgtg tcagttggc tatagtttagggac ^cgaagtttaagtggtgttaaagttttacagtcattacagcttagcagagcttcctaagta 560 



-eGFP.C.e.unc53 



-C.e. unc53 



PFVVCTVNRYOIPELQ IHHNFKM 



svmsnrlegf I 



Ear I 

5st I Ksp632! 

cctacgttacctccgacgacgggcggtagaggatgagtatcgtctaactgtacagatgccatcagagctcttcaL atcat 
ggatgcaatggaggctgctgcccgccatctcctactcatagcagattgacatgtctacggtagtc tcgagaagttttagtaactgaagaagggt 

-eGFP.C.e.unc53 




L R Y L R R R A V E 0 E Y R L TVQMPSELFK I I0FFP IA 



Ear I 
Ksp632l 



EcoR I 

i 



Sph I 



Bam HI 



cttcaggccgtcaataattttattgagaaaacgaattctgttgatgtgacagttggtccaagagcatgWtgaactgtcct ctaactgtc^ 
gaagtccggcagttattaaaataactcttttgcttaaga 



-eGFP.C.e.unc53 



■ — C.e. unc53 - 

L Q A . V N N F 1 E * . T N S V DVTVGPRACLNCPLTVOGS 



gtgaatggttcattcgattgtggaatgagaacttcattccatatttggaacgtgttgctagagatggcaaaaaaaccttcggtcgctgca 
cacttaccaagt aagctaacaccttactcttgaagtaaggtataaaccttgcac aa cgatctctaccgtttttttggaagcc agcgacgtgaaggaagct 

-eGFP.C e.ur,c53 ~~~ 



+• 59C-: 



REWFIRLVNENFIP 



-C.e. unc53 



yler^vardgkktfgrctsf 



0am HI 



Jth I 



Jth I 



GGATCCCACCGACATCGTC fCTAAAAAATGGCCo ^GoTTCGATGGTGAAAACCCGGAGAATGTGCTCAAACGTCTTCAAC tccaagacctcgtcccgtc A 

cctagggtggct gtagcagagattttttaccggcaccaagctaccacttItggg cc tcttacacgagtttgcagaagttgaggttctgg a^ 

• eGFP.C.e.unc53 



-C.e. unc53 



° . P T . 0 1 V , S K < V P VFDGENPENVLKRLQLO 



D L 



BNSDOCID: <WO 9824810A2_I_> 



WO 98/24810 267/270 PCT/EP97/06956 



Tuesday. 18 November 1997 10:35 Page 1 I 

fig 30 pEGFP72 (1 > 9697) Site and Sequence 

BspM I Xho I Sph I 

; \ i 

CCTGCCAACTCATCCCGACAACACTTCAATCCCCTCGAGTCGTTGATCCAATTGCATGCTACCAAGCATCAGACCATCGACAACATTT6AACAGAAGACT 

, 1 , 1 1 1 . 1_ — ■ 1 . 1 , 1 , , ■ ; 1 t 5io<: 

GGACGGTTGAGTAGGGCTGTTGTGAAGTTAGGGGAGCTCAGCAACTAGGTTAACGTACGATGGTTCGTAGTCTGGTAGCTGTTGTAAACTrGTC TTCTGA 



-eGFP.C.e.unc53 ■ 1 



>. 



- C.e. unc53 — I 



PANSSRQHFNPLESLIQLHATKHQTIDNI.TEO 

■ - ..... . . i . . . . 

Asp 718 
! pn. 

CTAATCTTCTCTCGCCTCTCCCCCGCTTTCCTTATCTTCGTACCGGTACCTGATGATTCCCCATTTTCCCCCTTTTCCCCCCAATTTCCCAGAACCTCCT 
GAT T AGA AGAGAGCGG AGAGGGGGCG AAAGGAATAGAAGCATGGCC A7GG AC7AC 7AAGGGG7A AAAGGGGGAAAAGGGGGGT Ta AAuGG i C I 1 uGaG'jm 
SNLLSPLPR FPYLRTGT. FP IFPLFPP ISQNLL 

Xma I 

Sma I Dra t $mn I 

! I i I 

GTTCCCTTTGTTCCTAGTCCTCCCGGGTGCCGACGCCGAAGCGATTTAAAAACCTTTTTCTTTCCGAAACATTTCCCATTGCTCATTAATAGTCAAATTG 

. 1 . f 1 1 . 1 1 1 i 1 1 1 i 1 ■ 1 1- S3CC 

CAAGG6AAACAAGGATCAGGAGGGCCCACGGCTGCGGCTTCGCTAAATTTTTGGAAAAAGAAAGGCTTTGTAAAGGGTAACGAGTAATTATCAGTTTAAC 

FPLFLVLP GADAEA [ KPFSFRN ISHCSLIVKL 

Xma I 

| > Sma I 

^Cho I | ! j Bam HI Xba I Bel I 

! II 1 ! i I 

AATAAACAGTGTATGTACTTAAAAAAAAAAAAAAAAAAACTCGAGGGGGGGCCCGGGATCCACCGGATCTAGATAACTGATCATAATCAGCCATACCACA 

1 — 1 1 1 ' 1 1 1 ' 1 1 1 1 1 — i — ~ — 1 — * 1 f 1 i > 1- 

TTATTTGTCACATACATGAATTTTTTTTTTTTTTTTTTTGAGCTCCCCCCCGGGCCCTAGGTGGCCTAGATCTATTGACTAGTATTAGTCGGTATGGTGT 

NKQCMYLKKKKKKLEGGPGIHR! I TOHNQPYH 

Pra I 0sm I JHpa I 

I ! i 

TTTGTAGAGGTTTTACTTGCTTTAAAAAAC CTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACTTGTTTATTGCA3 

AAACATCTCCAAAATGAACGAAATTTTTTGGAGGGTGTGGAGGGGGACTTGGACTTTGTATTTTACTTACGTTAACAACAACAATTGAACAAArAACGTC 

ICRGFTCFKKPPTPPPEPET.NECNCCC L V Y C 3 

* ■ ■ ■ 1 ■ - * - . . i . . ■ . . . . , . t . i 

0sm I 

i 

CTTATAATGGTTACAAATAAAGC AATAGCA TC ACAAATTTCACAAATAAAGCATTTTTTTCACTGC ATTCTAGTTGTGGTTTGTCCAAACTCATCAATG7 
GAATArTACCAATGTTTATTTCGTTATCGTAGTGTTTAAAGTGTTTATTTCGTAAAAAAAGTGACGTAAGATCAACACCAAACAGGTTTGAGTAGTTACA 
L -VLQIKO .HHKFHK S I F F T A F LWFVQTHQC 

MlU I Ssp I 

ATCTTAACGCGTAAATTGTAAGCGTTAATATTTTGTTAAAATTCGCGTTAAATTTTTGTTAAATCAGCTCATTTTTTAACCAATAGGCCGAAATCGGCAA 

• i — — h 1 1 , ! 1 j — i , 1 , : , (. S7C-: 

TAGAATTGCGCATTTAACATTCGCAATTATAAAACAATTTTAAGCGCAATTTAAAAACAATTTAGTCGAGTAAAAAATTGGTTATCCGGCTTTAGCCGTT 
ILTRKL.ALIFC.NSR IFVKSAHFL TNRPKSA 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 268/270 PCT/EP97/06956 



Tuesday. 18 November 1997 10:35 

fig 30 pEGFP72 (1 > 9697) Site and Sequence Page 1 

Bsrl 



A A 

sac: 



AA7CCCTTATAAATCAAAAGAATAGACCGAGATAGGGTTGAGTGTTGTTCCAGTTTGGAACAAGAGTCCACTATTAAAG AACGTGGACTCCAACGTCA, 

ttagggaatatttagttttcttatctggctctatcccaactcacaacaaggtcaaaccttgttctcaggtgataatttcWgcacctgaggttgcagttt 5 

K S L } N , Q K , N R p * • G . V L F Q F G T R V H Y . R T V T P T S i 

Dra ill 

GGGCGAAAAACCGTCTATCA6GGCGATGGCCCACTACGTGAACCATCACCCTAATCAAGTTTTT 
CCCGCTTTTTGGCAGATAGTCCCGCTACC^ 

G E * P S I R A M A H Y V N H H P N Q V F V G R G A V K H . IGT 

Nae I 

CTAAAGGGAGCCCCCGATTTAGAGCTTGACGGGGAAAGCCGGCGAACGTGGCGAGAAAGGAAGGGAAGAAAGCGAAAGGAGCGGGCGCTAGGGCGC T GGC 

gatttccctcgggggctaaatctcgaactgcccctttcggccgcttgcaccgctctttccttcccttctItcgctttcc^ 70CC 

L K G A P , 0 L E > 0. G E S ,R ft T V R E R K G R K R K £ R A L G R V 

pP* Kspl 

AAGTGTAGCGGTCACGCTGCGCGTAACCACCACACCCGCCGCGCTTAATGCGCCGCTACAGGGCGCGT^ 

TTCACATCGCCAGTGCG«KGCATTGGTGGTGT6GGCGGCGCGWTTAC6CGGC6ATGTCCCGCGCA ?liX 
^ ^ ' R S R C A . P P H P P R L M RRYRARQVALFGEMCAE 



P S P M 1 Ssp I Ksp632l 

TAAATGCTTCAATAATATTGAAA AAGGAAGAGTC 

tggggataaacaaataaaaagatttatgtaagtttatacataggcgagtactctgttattgggactattIacgaagttaItataactttItccttctcag 720,1 

P L F V Y F S K Y 1 Q I C IRS.DNNPOKCFNNI 



acccctatttgtttatttttctaaatacattcaaatatgtatccgctcatgagacaataaccctga- — 1 



E K G R V 



3ph I 

OxaN I p vu || i Ava III 

|| I Nsi I 

CTGAGGCGGAAAGAACCAGCTGTGGAArGTGTGTCAGTT^ 

GAC *^^gcc TTTCTTGGTCGACACCTTACACACAGTCAATCCCACACCTTTCAGGGGTCCGAGGGGTCGTCCGTCT^^ 

A S Q 



lrrke pavecvsv Rvvkvprlpsrqkyakh 



pph I 

! Ava III 
j Nsi I 

ta gtcagcaaccaggtgtggaaagtccccaggctccccagcaggcagaagtatgcaaagcatgcmck 

ATCMTC 6TTGGTCCACACCTTTCAGGGGTCCGA6GGGTCGTCCGTCTTCATACGTTTCGTACGTAGAGTTAA 

' 7 S " ° V ^ K V P , R L P S ft Q * . V A K H A S Q L y S N H S P A P M 

psrl jNco( Bgll 

CTCCGCCCATCCCGCCCCTAACTCCGCCCA GTTCCGCCCATTCTCCGCCCCATGGCTGACTAA^ 

gaggcgggtagggcggggattgaggcgggtcaaggcgggtaagaggcgg^ 



o A H P A P N.S A Q. FRPFSAPWLTNFFY 



lcrgrgrlg 



BNSDOCID: <WO 982481 0A2J_> 



WO 98/24810 269/270 PCT/EP97/06956 
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fig 30 pEGFP72 (1 > 9697) Site and Sequence 

Stu I 

j Avr II Cla I 

CTCTGAGCTATTCCAGAAGTAGTGAGGAGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAGATCGATCAAGAGACAGGATGAGGATCGTTTCGCATGATT.JM 

, ; • - i i i— — i i i * 1 i i lit i i i 76C«'.' 

gagactcgataaggtcttcatcactcctccgaaaaaacctccggatccgaaaacgtttctagctagttctctgtcctactcctagcaaagcgtactaac7 
(_ aipevvrrlfvrprllqrsikrqdeorfa.l* 

0spM I /(ma III 

! i 

ACAAGATGGATTGCACGCAGGTTCTCCGGCCGCTTGGGTGGAGAGGCTATTCGGCTATGACTGGGCACAACAGACAATCGGCTGCTCTGATGCCGCCGTG 

- , j i , 1 i . ■ i — — 1 . i i i i i i i ■ i i I 77CC 

TGTTC TACCTAACGTGCGTCCAAGAGGCCGGCGAACCCACCTCTCCGATAAGCCGATACTGACCCGTGTTGTC TGTTAGCCGACGAGACTACGGCGGCAC 

m k m d c tqvlrpl gvrg ysamtghnrqsaalmppc 

1 . ■ ■ - ' , , ' ■ ' ' ' ■ 1 I 1. ■ _ .1 •■ 

Nar I 

I Bbe I f<spl 

i i ? 
ttccggctgtcagcgcaggggcgcccggttctttttgtcaagaccgacctgtccggtgccctgaatgaactgcaagacgaggcagcgcggctatcgtggc 

i 1 ■ ■ 1 1 i . >H — i — i i — ■ > i •- 1 ■ 1 ' .t... h race 

aaggccgacagtcgcgtccccgcgggccaagaaaaacagttctggctggacaggccacgggacttacttgacgttctgctccgtcgcgccgatagcaccg 

sgcqrrgarfflsrptcpvp mncktrqrgyrg 

fsp I 

0al I i Pvu II Jth I 

! til 
tggccacgacgggcgttccttgcgcagctgtgctcgacgttgtcactgaagcgggaagggactggctgctattgggcgaagtgccggggcaggatc tcct 

, , i , i . 1 1 1 1 1 1 1 1 1 i I « H 7SCC 

accggtgctgcccgcaaggaacgcgtcgacacgagctgcaacagtgacttcgcccttccctgaccgacgataacccgcttcacggccccgtcctagagga 
vprraflaqlcstlslkregtgcyvakcrgr is 

pspM I 

gtcatctcaccttgctcctgccgagaaagtatccatcatggctgatgcaatgcggcggctgcatacgcttgatccggctacctgcccattcgaccaccaa 

, 1 , 1 , , , 1 , 1 . ... i ■ ■ i -+■ + 1 i i 1 h 3CCC 

cagtagagtggaacgaggacggctctttcataggtagtaccgactacgttacgccgccgacgtatgcgaactaggccgatggacgggtaagctggtggtt 
chltlllprkypsvlmqcggcirl i rlpahst7v 

1 ■ ■ ■ t ... I I.. ,. —I, ,. i ■—Li ., I ■ ... I 1 ■ ■ * ■ ■ ■ I ■ — - n t.,., I, — , .1 . ■ ■ ■ ■ . ■■ . 1 

Ear I 

Ksp632l 
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gcgaaacatcgcatcgagcgagcacgtactcggatggaagccggtcttgtcgatcaggatgatctggacgaagagcatcaggggctcgcgccagccgaac 

, i i 1 1 ■ i i 1 1 i 1 1 1 1 1 1 — — + 3ic«: 

cgctttgtagcgtagctcgctcgtgcatgagcctaccttcggccagaacagctagtcctactagacctgcttctcgtagtccccgagcgcggtcggcttg 

rn i assehvlgvkpvls irm i vtks irgsrqpn 

: Sph I Nco I 

TGTTCGCCAGGCTCAAGGCGAGCATGCCCGACGGCGAGGATCTCGTCGTGACCCATGGCGATGCCTGCTTGCCGAATATCAfGGTGGAAAATGGCCGCTT 

1 ■ . 1 1 1 1 . 1 1 1 1 1 1 1 — i ■ i 1- 32 

ACAAGCGGTCCGAGTTCCGCTCGTACGGGCTGCCGCTCCTAGAGCAGCACTGGGTACCGCTACGGACGAACGGCTTATAGTACCACCTTTTACCGGCGAA 

CSPGSRRACPTAR ISS PMAMPAC R I S W W K M A A 

- ■ ■ ■ ■ > • ■ - „ ■ ■ • ■ ■ ■ • ■ ■ - ■ * - - - 

Ear I 

Nae I Rsr II Ksp632t 

! I i 

TTCTGGATTCATCGACTGTGGCCGGCTGGGTGTGGCGGACCGCTATCAGGACATAGCGTTGGCTACCCGTGATATTGCTGAAGAGCTTGGCGGCGAATGG 

' — i *h . 1 1 ' 1 ■ ( ' 1 1 1 ' 1 • »- s:-a 

AAGACCT AAGTAGCTGACACCGGCCGACCCACACCGCCTGGCGATAGTCC TGTATCGCAACCGATGGGCACTATAACGACTTCTCGAACCGCCGCTTACC 
FLOSSTVAGWVVR T A I ft T .RWLPV 1 L L K 3 L A A M G 
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GCTGACCGCTTCCTCGTGC ^TACGGTATCGCCGCTCCCGATTCGCAGCGCATCGCC TTCTATCGCCTTCTTGACGAGTTCTTCTGAGCG GGAC TC TGG3 
CGACTGGCGAAGGAGCACGAAATGCC ATAGCGGCGAGGGCTAAGCGTCGCGTAGCGGAAGATAGCGGAAGAAC TGCTCAAGAAGACTCGCCCTGAGACCC ^ 
L T A . S S C , F T V S P P, ' » S A S P S I A F LTSSSEROSG 

Asu II BspM I 

i i 

GTTCGAAATGACCGACC AAGCGACGCCCAACCTGCCATCACGAGATTTCGATTCCACCGCCGCCTTCTATGAAAGGTTGGG CTTCGGAATCGTTTTCCGG 
CAAGCTTTACTGGCTGGTTCGCTGCGGGTTGGACGGTAGTGCTCTAAAGCTAAGGTGGCGGCGGAAGATAC TTTCCAACCCGAAGCCTTAGCAAAAGGCC ^ 
V R N 0 R P S 0 A 0 P A I T R F R F H R R L L . K V G L R N R F P 

Nae I j<spl Av , „ 

GACGCCGGCTGGATGATCCTCCAGCGCGGGGATCTCATGCTGGAGTTCTTCGCCCACCCTAGGGGGAGGCTAACTGAAACACGGAAGGAGACAATACCG G' 

ctgcggccgacctactaggaggtcgcgcccctagagtacgacctcaagaagcgggtgggatccccctccgattgactttgtgccttcctctgttatggcc 36C< 

G R R L D 0 P P A R G S H A G V L R P P . G E A N , N T E G 0 N T G 

|<spl 

AAGGAACCCGCGCTATGACGGCAATAAAAAGACAGAATAAAACGCACGGTGTTGGGTCGTTTGTTCATAAACGCGGGGTTCGGTCCCAGGGCTGGCA CTC 

ttccttgggcgcgatactgccgttatttttctgtcttatIttgcgtgccacaacccagcaaacaagtatItgcgccccaagccagggtcccgaccgtgag 970C 

RNP RYO GNKKTE NAR CW VVCS T R G S V P G L A L 

TGTCGATACCCCACCGAGACCCCATTGGGGCCAATACGCCCGCGTTTCTTCCTTTTCCCCACCCCACCCCCCAAGTTCGGGTGAAGGCCCAGGGCTCGCA 

acagctatggggtggctctggggtaaccccggttatgcgggcgcaaagaaggaaaaggggtggggtggggggttcaagcccacttccgggtcccgagcg! M0< 

C R Y P T E T P <- G , P ' « P R F F L F P T P P P K F G . R P R A R 

Alwnl pxaNI p ra , p ra , 
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AGATCCTTTTTGA TAATCTCATGACCAAAATCCCTTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGA 
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° " F • • S H D 0 N P <■ T ■ V F V P L S V R P R p K D Q R I F L R 

TCCTTTTTTTCT GCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGC TACCAGCGGTGGTTTGTTTGCCGGATCAAGAGCTACCAACTCTTTTTCCS 

aggaaaaaaagacgcgcattagacgacgaacgtttgtttItttggtggcgatggtcgccaccaaacaaacggcctagttctcgaIggttgagaaaaaggc 
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ttccattgaccgaagtcgtctcgcgtctatggtttatgacaggaagatcacatcggcatcaatccggtggtgaagttctIgagacatcgIggcggatgta 9200 
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